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Preface 


This volume contains the proceedings of the 29th International Conference on Automated 
Deduction (CADE-29). CADE is the major forum for the presentation of research in all 
aspects of automated deduction, including foundations, applications, implementations, 
and practical experience. CADE-29 was held on 1—4 July 2023, hosted at the Faculty 
of Civil and Industrial Engineering of the Sapienza University of Rome, Italy, and co- 
located with the 8th International Conference on Formal Structures for Computation 
and Deduction (FSCD). CADE-29 emphasized the breadth of topics that are of inter- 
est, including applications in and beyond computer science and mathematics, and the 
use/contribution of automated deduction in AI. 

The Program Committee (PC) examined 74 submissions this year and decided to 
accept 33 of them (28 full papers and 5 short papers or system descriptions). Submissions 
were single-blind and each of them was reviewed by at least three PC members or their 
external reviewers. The criteria for evaluation were originality and significance, technical 
quality, comparison with related work, quality of presentation, and reproducibility of 
experiments. 

The program of the conference included three invited talks, two of which were joint 
talks with FSCD: 


— “Lambda-Superposition: From Theory to Trophy” by Jasmin Blanchette, Ludwig- 
Maximilians-Universitat München, Germany 

— “Nominal Techniques for Software Specification and Verification” by Maribel 
Fernandez, King’s College London, UK (joint talk) 

— “Can we trust AI?” by Mateja Jamnik, University of Cambridge, UK (joint talk) 


A fourth invited talk, “Automated Reasoning with Data,” was given by Moshe Vardi 
as recipient of the 2023 Herbrand Award. 
The conference hosted several workshops, and one competition on July 4—6: 


ADeMaL: Automated Deduction for Machine Learning 

— Vampire 2023: The 7th Vampire Workshop 

— ThEdu’23: Theorem Proving Components for Educational Software 

SMT 2023: The 21st International Workshop on Satisfiability Modulo Theories 
— CASC 2023: The CADE ATP System Competition 


In addition to the best paper awards, three CADE awards were presented at the 
conference: 


— The 2023 Herbrand Award for Distinguished Contributions to Automated Reasoning, 
awarded to Moshe Y. Vardi, Rice University, USA, in recognition of his many foun- 
dational contributions to logic and automated reasoning, in particular automata-based 
verification methods, constraint solving, and knowledge representation. 

— The Thoralf Skolem Award for CADE papers that have passed the test of time by 
being the most influential papers in the field, awarded to each of the following papers: 
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e “Deciding Combinations of Theories” by Robert E. Shostak, CADE-6 (1982) 

e “The TPTP Problem Library” by Geoff Sutcliffe, Christian B. Suttner, and Theodor 
Yemenis, CADE-12 (1994) 

e “ASAT Based Approach for Solving Formulas over Boolean and Linear Mathemat- 
ical Propositions” by Gilles Audemard, Piergiorgio Bertoli, Alessandro Cimatti, 
Artur Kornilowicz, and Roberto Sebastiani, CADE-18 (2002) 

e “The Tree Width of Separation Logic with Recursive Definitions” by Radu Iosif, 
Adam Rogalewicz, and Jiri Simacek, CADE-24 (2013) 


— The Bill McCune PhD Award for a PhD thesis’ substantive contributions to the 
field of Automated Reasoning, awarded to Alessandro Gianola, Free University of 
Bozen-Bolzano, Italy. 


Sincere thanks go to the many people who contributed to the success of CADE-29 
— the authors, the participants, the invited speakers, the members of the PC, the external 
subreviewers, the general chair, the workshop and tutorial chair, the publicity chair, the 
staff at Springer, and the EasyChair team. 

CADE-29 gratefully acknowledges the support of the CADE trustees, the board of 
the Association for Automated Reasoning, ACM SIGLOG, and the sponsors Amazon 
Web Services and Springer. 
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A-Superposition: From Theory to Trophy 


Jasmin Blanchette!-23 © 
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3 Université de Lorraine, CNRS, Inria, LORIA, N ancy, France 


This extended abstract describes work performed in collaboration with Alexander Ben- 
tkamp, Simon Cruanes, Visa Nummelin, Stephan Schulz, Sophie Tourret, Petar Vuk- 
mirovi¢, and Uwe Waldmann on the design and implementation of A-superposition, in 
the context of the Matroyshka research project. 

When I conceived Matroyshka in 2015, my ambition was to develop high- 
erorder provers that perform well on higher-order proof obligations originating from 
Isabelle/HOL [11] and other proof assistants. Lawrence Paulson had noticed that the 
performance on truly higher-order goals left much to be desired and “given the inherent 
difficulty of performing higher-order reasoning using first-order theorem provers, the 
way forward is to integrate Sledgehammer with an actual higher-order theorem prover, 
such as LEO-IT” [13]. However, the subsequent integration of LEO-II [4] and Satallax 
[7] failed to bring the expected benefits [16]. My hypothesis was that most Isabelle 
problems have a large first-order component and the existing higher-order provers were 
not optimized for this kind of reasoning. 

To obtain higher-order provers that excel at first-order reasoning, I proposed to 
start with a highly successful first-order calculus, superposition, and generalize it, as 
much as possible, in a graceful way, culminating with a higher-order calculus. Provers 
implementing this calculus would combine the strengths of native higher-order provers 
and the strengths of the superposition provers that served as Sledgehammer backends: 
E [14], SPASS [6], and Vampire [5]. 

To tackle the challenge of designing this calculus, which we call A-superposition, we 
identified three milestones that we reached in turn. We first designed a superposition-like 
calculus for a A-free, Boolean-free higher-order logic (also called applicative first-order 
logic) [1]. This logic supports partial application of function symbols (e.g., f or f a, where 
f is binary) and application of variables (e.g., y a). Already at this stage, the first serious 
issue arose with the term order that superposition uses to prune the search space. We 
were able to work around the issue by introducing a new inference rule called argument 
congruence. For this and the other milestones, much of the work went into ensuring 
refutational completeness. 

For the second milestone, we designed a superposition-like calculus for a logic that 
supports A-abstractions but not interpreted Booleans [3]. One difficulty that arose is 
that inferences need to perform higher-order unification. Unfortunately, higher-order 
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unification is ill-behaved: It is undecidable and can yield a possibly infinite stream of 
unifiers. Moreover, due to interactions with the term order, we need to perform full 
unification (including flex-flex pairs) [17] and not simply preunification [10]. 

For the third milestone, we added support for interpreted Booleans [2]. This step 
was based on ideas by Ganzinger and Stuber [9]. They showed how to support logical 
symbols inside a superposition-like calculus, but fell short of including an interpreted 
Boolean type. Thus, we extended Ganzinger and Stuber’s work [12] and used it as the 
basis of a graceful generalization to higher-order logic. 

Whenever we designed a calculus, we also made sure to implement it in the Zip- 
perposition prover [8]. Zipperposition was originally developed by Cruanes to explore 
induction, arithmetic, and deduction modulo. It is written in OCaml and is highly exten- 
sible. He extended it with a pragmatic higher-order mode with support for A-abstractions 
and extensionality, without any completeness guarantees. This mode formed the basis 
for our subsequent work. Empirical evaluations on TPTP and Sledgehammer bench- 
marks were initially disappointing, but after some extensive tuning and new ideas for 
heuristics, Zipperposition became highly competitive, finishing first in the higher-order 
theorem division of the CADE ATP System Competition (CASC) in 2020, 2021, and 
2022. Inspired by a similar integration in Leo-III [15] and Satallax, Zipperposition 
incorporates E as a backend to tackle first-order subproblems. 

We also implemented A-superposition in the high-performance prover E [18,19]. The 
E implementation is pragmatic and sacrifices completeness. For example, the possibly 
infinite stream of unifiers is truncated to make it finite, and some of the most explosive 
rules of A-superposition are omitted. Probably because Zipperposition has a portfolio of 
modes extensively tuned against the TPTP library and uses a version of E as a backend, 
E finished only second in the higher-order theorem division of CASC 2022. On the other 
hand, E finished first in the Sledgehammer division of the same competition. Despite 
this, the performance improvement over Sledgehammer’s first-order backends is small. 
I suspect that Isabelle problems are even more first-order than I thought. 

We learned a few other lessons in the process: 


e The identification of reasonable milestones was invaluable. 

e The completeness proofs gave us some useful guidance as we designed the calculi, 
even if it turns out that the best empirical modes are incomplete. 

e Another useful guide was the design goal of achieving, as much as possible, a graceful 
generalization, preserving the features that make standard superposition successful 
on first-order problems. 

e Disappointing evaluations can simply mean that more fine-tuning and heuristics are 
needed. 

e The presence of many complementary modes in a well-tuned portfolio can be as 
important as a highly efficient implementation. 


Acknowledgment. I thank Alexander Bentkamp, Stephan Schulz, Mark Summer- 
field, Petar Vukmirović, and Uwe Waldmann for textual suggestions. This research 
has received funding from the European Research Council (ERC) under the Euro- 
pean Union’s Horizon 2020 research and innovation program (grant No. 713999, 
Matryoshka). This research has also received funding from the Netherlands Organization 
for Scientific Research (NWO) under the Vidi program (project No. 016. Vidi.189.037, 
Lean Forward). 
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Abstract. In this talk we discuss the nominal approach to the specifica- 
tion of languages with binders and some applications to programming 
languages and verification. 


Keywords: Binding Operator - Nominal Logic - Nominal Rewriting - 
Unification - Equational Axioms - Type Systems 


Overview 


The nominal approach to the specification of languages with binding operators, intro- 
duced by Gabbay and Pitts [20, 21, 28], has its roots in nominal set theory [27]. Its user- 
friendly syntax and first-order presentation (indeed, nominal logic [25, 26] is defined as 
a theory in first-order logic) makes formal reasoning about binding operators similar to 
conventional on-paper reasoning. 

Nominal logic uses the well-understood concept of permutation groups acting on 
sets to provide a rigorous, first-order treatment of common informal practice to do with 
fresh and bound names. Nominal matching and nominal unification [36, 37] (which 
work modulo a-equivalence) are decidable and efficient algorithms exist [7, 8, 9, 22], 
which are the basis for efficient implementations of nominal rewriting [17—19, 34]. 

A number of systems (such as Nominal Isabelle [35]) highlighted the benefits of the 
nominal approach, which gave rise to elegant formalisations of Gédel’s theorems [24] 
and the zr-calculus [5] and to advances in programming language semantics [23]. How- 
ever, there are still some obstacles to the inclusion of nominal features in programming 
languages and verification environments. 

In this talk, I will present our current work towards incorporating nominal techniques 
into two widely-used rule-based first-order verification environments: the K specification 
framework [30] and the Maude programming language [11, 12]. 

An important component of rule-based programming and verification environments 
is the algorithm used to check equivalence of terms and to solve equations (unification). In 
practice, unification problems arise in the context of equational axioms (e.g., to take into 
account associative and commutative (AC) operators [6, 13, 14, 32, 33]). The first part of 
the talk will discuss notions of w-equivalence modulo associativity and commutativity 
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axioms [1], extensions of nominal matching and unification to deal with AC operators 
[2], and the use of nominal narrowing [3] to deal with equational theories presented by 
convergent nominal rewriting rules. 

Another important component of rule-based programming and verification environ- 
ments is the type system. In the second part of the talk, I will discuss type systems 
for nominal languages (including polymorphic systems [15] and intersection systems 
[4]). Dependent type theories, the dominant approach to formalising programming lan- 
guages, have been extended with nominal features [10, 29, 31]. A lambda-less nominal 
dependent type system is available [16] and we are currently working on a type checker 
for this system. 

The talk is structured as follows: we will start with the definition of nominal logic 
(including the notions of fresh atoms and alpha-equivalence) followed by a brief intro- 
duction to nominal matching and unification. We will then define nominal rewriting, a 
generalisation of first-order rewriting that provides in-built support for alpha-equivalence 
following the nominal approach. Finally, we will discuss notions of nominal unification 
and rewriting modulo AC operators and briefly overview typed versions of nominal 
languages. 


Acknowledgements. I am grateful to my PhD students and co-authors for many fruitful 
collaborations. 
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Abstract. In the last couple of decades, developments in SAT- 
based optimization have led to highly efficient maximum satisfiability 
(MaxSAT) solvers, but in contrast to the SAT solvers on which MaxSAT 
solving rests, there has been little parallel development of techniques 
to prove the correctness of MaxSAT results. We show how pseudo- 
Boolean proof logging can be used to certify state-of-the-art core-guided 
MaxSAT solving, including advanced techniques like structure sharing, 
weight-aware core extraction and hardening. Our experimental evalua- 
tion demonstrates that this approach is viable in practice. We are hope- 
ful that this is the first step towards general proof logging techniques for 
MaxSAT solvers. 


Keywords: MaxSAT - core-guided search - proof logging - certifying 
algorithms 


1 Introduction 


Combinatorial optimization is one of the most impressive, and most intriguing, 
success stories in computer science. This area deals with computationally very 
challenging problems, which are widely believed to require exponential time in 
the worst case [21,49]. In spite of this, during the last couple of decades aston- 
ishing progress has been made on so-called combinatorial solvers for a number 
of different algorithmic paradigms such as Boolean satisfiability (SAT) solving 
and optimization [15], constraint programming (CP) [72], and mixed integer pro- 
gramming (MIP) [1,16]. Today, such solvers are routinely used to solve real-world 
problems with hundreds of thousands or even millions of variables. 

While the performance of modern combinatorial solvers is truly impressive, 
one negative aspect is that they are highly complex pieces of software, and 
it is well documented that even mature state-of-the-art solvers sometimes give 
wrong results [2, 18, 25,37]. This can be fatal for applications where correctness is 
a non-negotiable demand. Perhaps the most successful approach for addressing 
this problem so far is the requirement in the SAT solving community that solvers 
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should be certifying [3,62], meaning that when given a formula a solver should 
output not only a verdict whether the formula is satisfiable or unsatisfiable, but 
also an efficiently machine-verifiable proof log establishing that this verdict is 
guaranteed to be correct. One can then feed the input formula, the verdict, and 
the proof log to a special, dedicated proof checker, and accept the result if the 
proof checker agrees that the proof log shows that the solver computation is 
valid. Over the years, different proof formats such as RUP [43], TraceCheck [14], 
DRAT [44,45], GRIT [27], and LRAT [26] have been developed, and for almost 
a decade DRAT proof logging has been compulsory in the (main track of the) 
SAT competition. However, there has been very limited progress in designing 
analogous proof logging techniques for more powerful algorithmic paradigms. 
Our focus in this work is on the optimization paradigm that is arguably 
closest to SAT solving, namely maximum satisfiability or MaxSAT solving [8,56], 
and the challenge of developing proof logging techniques for MaxSAT solvers. 


1.1 Previous Work 


Since essentially all modern MaxSAT solvers are based on repeated invocations 
of SAT solvers, a first question is why SAT proof logging techniques are not 
sufficient. While DRAT is a very powerful proof system, it seems that the over- 
head of generating proofs of correctness for the rewriting steps in between SAT 
solver calls in MaxSAT solvers is too large to be tolerable for practical purposes. 
Another, related, problem is that for optimization problems one needs to reason 
about the objective function, which DRAT struggles to do since its language is 
limited to disjunctive clauses. But perhaps the biggest challenge is that while 
modern SAT solving is completely dominated by the conflict-driven clause learn- 
ing (CDCL) method [11,59,66], for MaxSAT there is a rich variety of approaches 
including linear SAT-UNSAT (or model-improving search) [31,54,68], core- 
guided search [4,7,35,67], implicit hitting set (IHS) search [28,29], and some 
recent work on branch-and-bound methods [57] (where we stress that the lists 
of references are far from exhaustive). 

One tempting solution to circumvent this heterogeneity of solving approaches 
is to treat the MaxSAT solver as a black box and use a single call to a certify- 
ing SAT solver to prove optimality of the final solution found. However, there are 
several problems with this proposal. Firstly, we would still need proof logging to 
ensure that the input to the SAT solver is a correct encoding of a claim of optimal- 
ity for the correct problem instance. Secondly, such a SAT call could be extremely 
expensive, running counter to the goal of proof logging with low (and predictable) 
overhead. Finally, even if the SAT-call approach could be made to work efficiently, 
this would just certify the final result, and would not help validate the correctness 
of the reasoning of the solver. For these reasons, our goal is to provide proof logging 
for the actual computations of the MaxSAT algorithm. 

While some proof systems and tools have been developed specifically for 
MaxSAT [19,34,48,53,64,65,69-71], none of them comes close to providing 
general-purpose proof logging, because they apply only for very specific algo- 
rithm implementations and/or fail to capture the full range of reasoning used in 
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an algorithmic approach. A recent work [75] by two co-authors on the current 
paper instead leverages the pseudo-Boolean proof logging system VERIPB [76] 
to certify correctness of the unweighted linear SAT-UNSAT solver QMAXSAT. 
VERIPB is similar in spirit to DRAT, but operates with more general 0-1 linear 
inequalities rather than just clauses. This simplifies reasoning about optimiza- 
tion problems, and also makes it possible to capture the powerful MaxSAT solver 
inferences in a more concise way. VERIPB has previously been used for proof 
logging of enhanced SAT solving techniques [17,42] and pseudo-Boolean solv- 
ing [38], as well as for providing proof-of-concept tools for a nontrivial range of 
techniques in constraint programming [33,41] and subgraph solving [39,40]. 


1.2 Our Contributions 


In this work, we use VERIPB to provide, to the best of our knowledge for the 
first time, efficient proof logging for the full range of techniques in a cutting-edge 
MaxSAT solver. We consider the state-of-the-art core-guided solver CGSS [47], 
based on RC2 [46], and show how to enhance CGSS to output proofs of cor- 
rectness of its reasoning, including sophisticated techniques such as stratifica- 
tion [6,58], intrinsic-at-most-one constraints [46], hardening [6], weight-aware 
core-extraction [13], and structure sharing [47]. We find that the overhead for 
such proof logging is perfectly manageable, and although there is certainly room 
to improve the proof verification time, our experiments demonstrate that already 
a first proof-of-concept implementation of this approach is practically feasible. 
It has been shown previously [32,39,52] that proof logging can also serve as 
a powerful debugging tool. This is because faulty reasoning is likely to lead to 
unsound proofs, which can be detected even if the solver produces correct output 
for all test cases. We exhibit yet another example of this—some proofs for which 
we struggled to make the verification work turned out to reveal two well-hidden 
bugs in RC2 and CGSS that earlier extensive testing had failed to uncover. 
Although it still remains to provide proof logging for other MaxSAT 
approaches such as (general, weighted) linear SAT-UNSAT and implicit hitting 
set (IHS) search, we are optimistic that our work could serve as an important 
step towards general adoption of proof logging techniques for MaxSAT solvers. 


1.3 Outline of This Paper 


After reviewing preliminaries for pseudo-Boolean reasoning and core-guided 
MaxSAT solving in Sects. 2 and 3, we explain how core-guided MaxSAT solvers 
can be equipped with proof logging methods in Sect. 4. In Sect. 5 we present our 
experimental evaluation, after which some concluding remarks and directions for 
future research are given in Sect. 6. 


2 Preliminaries 


We start by a review of some standard material which can be found, e.g., in [20, 
38,42]. A literal £ over a Boolean variable x (taking values in {0,1}, which we 
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identify with false and true, respectively) is x itself or its negation Z, where 
T = 1-2. A pseudo-Boolean (PB) constraint is a 0-1 integer linear inequality 
C = a;l; => A (where = denotes syntactic equality). When convenient, we 
can assume without loss of generality that PB constraints are in normalized 
form [10]; i.e., all literals 4; are over distinct variables and the coefficients a; 
and the degree (of falsity) A are non-negative integers. The set of literals in 
C is denoted lits(C). The negation of C is aC = SU, ail; < A — 1 (rewritten 
in normalized form when needed). A pseudo-Boolean formula is a conjunction 
F = N; C; of PB constraints. Note that a disjunctive clause can be viewed as a 
PB constraint with all coefficients and the degree equal to 1, and so formulas in 
conjunctive normal form (CNF) are special cases of PB formulas. 

A (partial) assignment pis a (partial) function from variables to {0,1}, which 
we extend to literals by respecting the meaning of negation. Applying p to a 
constraint C yields CÌ, by substituting the variables assigned in p by their values, 
and for a formula F = /\;C; we define Flp = Aj Cjlp. The constraint C is 
satisfied by p if ttt a; > A, and p satisfies F if it satisfies all C € F, in which 
case F is satisfiable. A formula lacking satisfying assignments is unsatisfiable. 
We say that F implies C, denoted F — C, if any assignment satisfying F also 
satisfies C. 

An objective O = > w;l; +M is an affine function over literals 4; to be mini- 
mized by (total) assignments a satisfying F. The value (or cost) of an objective O 
under such an a, which we refer to as a solution, is O(a) = XY au, ;=1 Wi + M. 
We write coeff (O, ¢;) to denote the coefficient w; of a literal 4; € lits(O). 

The foundation of the pseudo-Boolean proof logging in this paper is the cut- 
ting planes proof system [24], which is a method to iteratively derive new con- 
straints implied by a pseudo-Boolean formula F. If C and D have been derived 
before or are axiom constraints in F, then any positive linear combination of 
these constraints can be derived. Literal axioms £ > 0 can also be added to any 
previously derived constraints. For a constraint )°, a;l; > A in normalized form, 
division by a positive integer d derives }7,/a;/d]¢; > [A/d], and we also add 
a saturation rule that derives $`; min{a;, A} - 4; > A (where the soundness of 
these rules crucially depends on the normalized form). It is well known that any 
PB constraint implied by F can be derived using these rules. 

A constraint C is said to unit propagate the literal Z to true under an assign- 
ment p if Cl, cannot be satisfied unless £ is true. During unit propagation on 
F under p, we extend p iteratively by any propagated literals until an assign- 
ment p’ is reached under which no constraint C € F is propagating or some 
constraint C wants to propagate a literal that has already been assigned to the 
opposite value. The latter case is called a conflict, since C is violated by p’. We 
say that F implies C by reverse unit propagation (RUP), and that C is a RUP 
constraint with respect to F, if F A =C unit propagates to conflict under the 
empty assignment. It is not hard to see that F — C holds if C is a RUP con- 
straint, and as a convenient shorthand we will add a RUP rule for deriving new 
constraints. 
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In addition to deriving constraints that are implied by a formula F, we also 
allow deriving so-called redundant constraints C that are not implied by F as 
long as some optimal solution is guaranteed to be preserved. This is done by 
extending the proof system with a redundance-based strengthening rule [17,42]. 
We will only need the special case of this rule saying that for a fresh variable z 
and for any constraint D = 5°, a;l; > A we can introduce the reified constraints 


reif 


Chiel D) = (au -At) 2+, 06 > ,a-A+1 (1b) 


encoding the implications z > D and z < D, respectively. We refer to z as the 
reification variable, and when D is clear from context, we will sometimes write 
just Cz.,(z) for (la) and CS,,(z) for (1b). 

The mazimum satisfiability (MaxS AT) problem can be described conveniently 
as a special case of pseudo-Boolean optimization. A discussion on the equivalence 
of the following and the—more classical—clause-centric definition can be found 
in, for instance, [8,55]. An instance (F,O) of the (weighted partial) MaxSAT 
problem consists of a CNF formula F and an objective function O written as a 
non-negative affine combination of literals. The goal is to find a solution a that 
satisfies F and minimizes O(a). We say that such a solution a is optimal for the 
instance and that the optimal cost of the instance (FO) is O(a). 


3 The OLL Algorithm for Core-Guided MaxSAT Solving 


We now proceed to discuss the core-guided MaxSAT solving in CGSS, which is 
based on the OLL algorithm [5,63], and describe the main heuristics used in effi- 
cient implementations of this algorithm. Given a MaxSAT instance (Forig, Oorig); 
OLL takes an optimistic view and attempts to find an assignment satisfying Forig 
in which Oprig equals its constant term (i.e., all literals in lits(Ojrig) are false). 
If such a solution exists, it is clearly optimal. Otherwise, the solver will extract 
a core K, which is a clause such that (i) K only contains objective literals, 
i.e., lits(K) C lits(Oorig), and (ii) Forig implies K, which means that any 
solution to Forig has to set at least one literal in lits(K) to true. The cost 
w(K, O) = min{ coeff (O, £) : L € lits(K)} of a core K is the smallest coefficient 
in the objective O of any literal in K. The core K is used to (conceptually) 
reformulate the instance into (Fryer, Oref) which has the same minimal-cost solu- 
tions. The constant term LB in Oef is a lower bound on the optimal cost of 
the instance, and the reformulation is done in such a way that the lower bound 
increases (exactly) with the cost of the core K as defined above. 

In more detail, the algorithm maintains a reformulated objective Opep (ini- 
tialized to Orig) such that the (non-normalized) pseudo-Boolean constraint 


Oorig = Oreg = X coeff (Oorigsb) b> XO coeff (Ones, b')-K' + LB (2) 


bE lits (Oprig) b! E€ lits (Ores ) 
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is satisfied by all solutions of F,eț. Note that the constraint (2), which we refer 
to as an objective reformulation constraint, implies that the constant term LB 
is a lower bound on the optimal cost. 

In each iteration, a SAT solver is queried for a solution a to Fref with 
Ore¢(a) = LB. If such an a exists, the constraint (2) yields that O,,ig(a) = LB, 
and so a is a minimal-cost solution to (Forig, Oorig). Otherwise, the solver returns 
a new core K that requires at least one literal in lits(O,ef) to be set to 1. This 
implies that the optimal cost is strictly larger than DB, and the core K is used 
for a new reformulation step. 

The objective reformulation step adds new clauses to Ff encoding the con- 
straints yak = diyerinK)0 2 k for k = 2,...,|K|. The new variables yx, 
are added to O,e¢ with coefficient w(K, Oper) equalling the cost of K, and the 
coefficient in Oef of each literal in K is decreased by the same amount. Finally, 
the lower bound LB—the constant term of O,¢f—is also increased by w(K, Oper). 
Since yg, encodes that at least k literals in K are true, we have the equality 


Joens)? = 1+ yo YK,k, Where the additive 1 comes from the fact that at 
least one literal in K has to be true, and the reformulation step is just applying 
this equality multiplied by w(K, Oef) to Oref. Notice that the variables added 
during objective reformulation can later be discovered in other cores. In practice, 
all implementations of OLL we are aware of encode the semantics of counting 
variables incrementally [60]. This means that initially only the variable yg,2 is 
defined, and the variable yx ,;+1 is introduced only after yx, is found in a core. 

Implementations of OLL for MaxSAT—including the CGSS solver that we 
enhance with proof logging in this work—extend the algorithm with a number of 
heuristics such as stratification [6,58], hardening [6], the intrinsic-at-most-ones 
technique [46], weight-aware core extraction [13], and structure sharing [47]. 

Stratification extracts cores not over all literals in O,-¢ but only over those 
whose coefficient is above some bound Wstrat- This steers search toward cores 
containing literals with high coefficients, resulting in larger increases of LB. Once 
no more cores over such variables can be found, the algorithm lowers Wstrat, 
terminating only after no more cores can be found with Wstrat = 1. The fact that 
no more cores containing only variables with coefficients above Wstrat exist is 
detected by the SAT solver returning a (possibly non-optimal) solution a. The 
minimal cost Oprig(@) of all such solutions gives an upper bound UB on the 
optimal cost of the instance, allowing OLL to terminate as soon as LB = UB. 

Hardening fixes literals in O;e¢ to 0 based on information provided by the 
current upper and lower bounds UB and LB. If for any b € lits(O,er) it holds 
that coeff (Ores, b) +LB > UB, then any solution a with b = 1 would have higher 
cost than the current best solution known, and would thus not be optimal. 

The intrinsic-at-most-one technique identifies subsets S C lits (O,ef ) of objec- 
tive literals such that pes b < 1 is implied, i.e., any solution can assign at most 
one literal in S to 0. This is used both to increase the lower bound and to refor- 
mulate the objective. If we let min = min{ coeff (Oper, b) : b € S}, then S implies 
a lower bound increase of LBs = (|S| — 1) - Wmin. Additionally, we define a new 
variable fs by the clause Zs + ses b > 1 to indicate if in fact all literals in S 
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are true, and introduce it in the reformulated objective with coefficient win. 
This means that we remove the already known lower bound LBs from O;er and 
transfer the possible additional cost Wmin from S to the variable £s. 

Weight-aware core extraction (WCE) delays objective reformulation, and the 
accompanying increase in new variables and clauses, for as long as possible. 
When a new core K is extracted by a solver that uses WCE, initially only the 
coefficient of each b € lits(K) is lowered and the lower bound LB is increased 
by w(K, Oef). Then the SAT solver is invoked again with the literals, that 
still have coefficients above Wstrat in Oper, set to 0. When the SAT solver finds 
a satisfying assignment extending the assumptions, all objective reformulations 
steps are then performed at once. This is correct since the final effect is the same 
as if the core would have been discovered one by one and immediately followed 
by objective reformulation. Notice that this core extraction loop is guaranteed to 
terminate since the coefficient of at least one variable is decreased to 0 for each 
new core. Structure sharing is a recent extension to weight-aware core extraction 
that makes use of the potential overlap in cores detected in order to achieve more 
compact encodings of counting variable semantics. 


4 Proof Logging for the OLL Algorithm for MaxSAT 


We have now reached a point where we can describe the contribution of this 
work, namely how to add proof logging to an OLL-based core-guided MaxSAT 
solver, including all the state-of-the-art techniques described in Sect. 3. 

In our proof logging routines we maintain the invariants described next. The 
reformulated objective O,es is already implicitly tracked by the solver and at all 
times it is possible to derive that Orig > Oper as in (2). We also keep track of 
the current upper bound UB on Osrig and best solution apes¢ found so far. All 
cores that have been found and processed are in the set K. 


SAT Solver Calls. The CDCL SAT solvers used in core-guided MaxSAT algo- 
rithms can support DRAT proof logging, and since the proof format used by 
VERIPB is a strict extension of DRAT (modulo small and purely syntactical 
modifications) it is straightforward to provide proof logging for the part of the 
reasoning done in SAT solver calls, and to add all learned clauses to the proof 
checker database. 

Each invocation of the SAT solver returns either a new solution @ or a new 
core K. When a solution a with O,,ig(a) < UB is obtained, it is logged in the 
proof, which adds the objective-improving constraint 


Obrig < UB > 1 (3a) 
(which is 
XO coeff (Oorig,b) +8 > 1+ SY coeff (Oorigb) — UB (3b) 
bE lits (Oorig) bE lits (Oorig) 


in normalized form). A technical side remark is that later solutions with cost 
greater than UB cannot successfully be logged, since they violate the con- 
straint (3a) added to the proof checker database, and so the proof logging rou- 
tines make sure to only log solutions that improve the current upper bound. 
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If the SAT solver instead returns a new core K, this clause is guaranteed to 
be a reverse unit propagation (RUP) clause with respect to the set of clauses 
currently in the solver database, and so we can use the RUP rule to add K 
to the proof checker database (which contains a superset of the clauses known 
by the solver). For our book-keeping, we also add K to the set K. A special 
case is that K could be the contradictory empty clause, corresponding to the 
pseudo-Boolean constraint 0 > 1. This means that there are no solutions to the 
formula. 

To optimize the efficiency of proof verification, constraints should be deleted 
from the proof when they are no longer needed. Since SAT solver proofs are 
only used to prove unsatisfiability this does not cause any issues, but when 
certifying optimality we have to be careful in order not to create better-than- 
optimal solutions (which could happen if, e.g., constraints in the input formula 
are removed). The checked deletion rule [17] ensuring this in VERIPB does not 
have any analogue in DRAT, so some care is needed here when translating SAT 
solver proofs into the VERIPB format. 


Incremental Totalizer with Structure Sharing. Different implementations of OLL 
for MaxSAT differ in which encoding is used for the counting variables introduced 
during objective reformulation [9,50,51]. The two solvers we consider use total- 
izers [9], so we start by explaining this encoding and then show how to provide 
proof logging for the clauses added to the proof checker database. 

The totalizer encoding for a set I = {¢),..., Zn } of literals is a CNF formula T 
that defines counting variables yr j for j =1,...,n such that for any assignment 
that satisfies 7 the variable yz; is true if and only if ye Li > j. The structure 
of T can be viewed as a binary tree, with literals in J at the leaves and with 
each internal node 7 associated with variables counting the true leaf literals in 
the subtree rooted at 7. The variables yz, are associated with the root of the 
tree. 

More formally, given a set of literals J, we construct a binary tree with leaves 
labelled by the literals in I. For every node 7 of T, let lits(ņ) denote the leaves 
in the subtree rooted at 7; where it is convenient, we will overload J to also refer 
to the root note. For each internal node ņ, the totalizer encoding introduces 
the counting variables Sy = {Yn,1,--+,¥Yn,|lits(n)|}, the meaning of which can be 
encoded recursively in terms of the variables S,, and Sp, for the children 1, 
and 72 of 7 by the (pseudo-Boolean form of the) clauses 


C08) = Yno + Yma t Ung Z 1 (4a) 
Omg (a, p, o) = Une41 F Ynı,a+1 + Yn2,B+1 >1 (4b) 


for all integers a, 3,0 such that a + 8 = o and 0 < a < |lits(m)|,0 < 8 < 
|lits(ņ2)|, and 0 < o < |lits(7)|. We use the notational conventions in (4a)- 
(4b) that ye1 = £ for all leaves £, and that yņ,o = 1 and Yp \xts(m)\41 = 0 for 
all nodes 7 (so that clauses containing y,,9 or Yn,|lits(n)|+1 Can be simplified to 
binary clauses or be omitted when they are satisfied). The clauses C7 (a, 3,0) 
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in (4b) are not necessarily added to the clause database of the MaxSAT solver, 
but are sometimes included for improved propagation. 

We now turn to the question of how to derive the clauses (4a)—(4b) encod- 
ing the meaning of the counting variables yz j in the proof. This is a two-step 
process. First, reified pseudo-Boolean (and, in general, non-clausal) constraints 
Crie(Yn,j) and CSe(yy,7) as in (1a)—(1b), encoding that yy, holds if and only 
if yee tits(n) £ > j, are derived by redundance-based strengthening. Then the 
clauses added to the MaxSAT solver are derived from these pseudo-Boolean con- 
straints. Although we omit the details due to space constraints, it is not hard to 
show that for any internal node 7 with children 7 and n2, the clauses C (a, 3,0) 
and C7 (a, 6,c) in (4a)—-(4b) can be derived from the constraints CXis(Yn,c), 
Creit(Yn.o)s Cri(Yna.)> Creit (Ym or)» Creit(Yne,8)> and Creit(Yne,8) by standard cut- 
ting planes derivations as in [75]. In particular, the certification of these totalizers 
can be done incrementally: clauses in the encoding can be derived as the corre- 
sponding counter variables are lazily introduced in the OLL algorithm. 

This approach is also compatible with structure sharing, where subtrees of 
a previously constructed totalizer tree can be reused (to avoid doing the same 
work twice). The only constraints from a subtree rooted at 7* that are needed 
when generating another totalizer encoding at a higher level are the constraints 
Crie(Yn*,o) and CSi¢(yn*,c) defining the counter variables in the subtree root 7*. 

To decrease the memory usage of the proof checker, it can be useful to delete 
reification constraints from the proof once we know that they will no longer be 
needed. Without structure sharing, for an internal node 7, once all clauses that 
mention y,,; are created, the constraints CyS(Yn,;) and Cis(Yn,j) will not be 
used anymore and can thus be deleted. On the other hand, structure sharing 
reuses aS many counting variables as possible, even over multiple iterations of 
weight-aware core extraction. This means that CSi(y,,;) and CZilYn,j) need 
to be retained, even after all clauses in the totalizer encoding for all parents of 
node 7 have been created. 


Objective Reformulation. If counting variables yx; for i = 2,...,5% have been 
introduced for the core K, then the objective reformulation with respect to K 
is derived with the help of the constraint 


FSL fe (5a) 
=2 


bEK i 


So b+ Gi = 8K (5b) 


bek i=2 


in normalized form). The constraint (5b) can in turn be obtained from the core 
clause K and the reified constraints C%.(yx,;). It is clear that this should be pos- 
sible, since the latter constraints define the variables yx,; precisely so that (5b) 
should hold, and we refer to Algorithm 5 in [38] for the details. Also, each time 
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a new counting variable yx,; is introduced for a core K, we add it to (5b) to 
maintain this constraint as an invariant. 

To illustrate how this update works, suppose we have a core K = >)", b; > 1 
for which Jb + 3855" Yk i = sK —1 has already been derived. The next 
counting variable yx... is introduced by the reification sx Yg, 5, +) bi > sK- 
The previous constraint is multiplied by sx — 1 and added to the new reified 
constraint, yielding sx} ;—1 b+(se-1) 53 Uki tsk UK sp Z (3K71) sK+1. 
Dividing this last constraint by sx results in )7j_, b+ Di2 Yg{ Z sK, which is 
the desired updated constraint. 

For a set of extracted cores K, we can derive the objective reformulation 
constraint Orig > Ore by multiplying (5b) for each K € K by the cost w(K, Ore) 
of K and summing up all these multiplied constraints. The fact that we have 
an inequality Osrig > Oref rather than an equality is due to the incremental use 
of totalizers. More specifically, if sg = |lits(K)| would hold for every K € K, it 
would be possible to derive Osrig = Oper instead. Here we would like to stress one 
subtlety for developing proof logging for OLL: as the algorithm progresses and 
more output variables of totalizers are introduced (i.e., the counters sg increase), 
the reformulated objective potentially also increases—because of added counted 
variables when sj increases we have the inequality Oprig > Or > op. For 
this reason, the old constraint Oprig > ogi cannot be used to derive Oprig = 

ee after objective reformulation. Instead, we have to derive Oprig = Oper from 
scratch each time the solver argues with the reformulated objective. For doing 
this we need to have access to the entire set K of cores. 


Proving Optimality. When the solver has found an optimal solution and estab- 
lished a matching lower bound, optimality is certified in the proof log using a 
proof by contradiction from the objective reformulation constraint Osrig > Ores 
in (2) and the (normalized form of the) objective-improving constraint Osrig < 
UB — 1 in (8b). If we add these two constraints and cancel like terms, we get 


coeff (Oef, b' D >1-UB+IB+ coeff (Opes , b’) - (6) 
f f 


b' E lits (Oef ) b! E€ lits (Orr ) 


Since we have UB = LB when the optimal solution has been found, and since 


a f 
Dobe lite (Ory) coeff (Oper, b’)-b cannot possibly exceed Dob €lits (Ory) coeff (Oper , b’), 
the constraint (6) can be simplified to contradiction 0 > 1. 


Intrinsic At-Most-One Constraints. Certifying intrinsic at-most-one constraints 
for a set S C lits(O,ef) of literals requires deriving (i) the at-most-one constraint 
stating that at most one b € S is assigned to 0 by any solution and (ii) constraints 
defining the variable Zs. Such sets S are detected by unit propagation that 
implicitly derives implications b; > b; in the form of binary clauses b; + b; > 1 
for every pair of variables in S. In the proof log, all these binary clauses can 
be obtained by RUP steps, after which the at-most-one constraint } pes b= 1 
(which is }°,¢5 b > |S| — 1 in normalized form) is derived by a standard cutting 
planes derivation (see, e.g., [24]). 
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The reified constraints s = pes b = |S| and ls => Vices b = |S] defining 
the variable £s (which are (s+) pes b > 1 and s+) pes b = |S], respectively, in 
normalized form) are derived by redundance-based strengthening. Note that the 
latter constraint does not exist in the MaxSAT solver, but we need it in the proof 
in order to derive the objective reformulation for the at-most-one constraint. 


Hardening. Formally, hardening corresponds to deriving b > 1 in the proof for 
some literal b € lits(O,ef) for which UB < LB + coeff (Ore, b) holds. Such an 
inequality b > 1 is implied by RUP if we first derive the constraint (6), since 
assigning b = 1 results in (6) being contradicting. 


Upper Bound Estimation. A final technical proof logging detail is that some 
implementations of the OLL algorithm for MaxSAT—including the Python- 
based version of CGSS—do not use the actual cost of the solution found by the 
SAT solver as the upper bound UB when hardening. In order to avoid the over- 
head in Python of extracting the solution from the SAT solver, an upper bound 
estimate UBest is computed instead based on the initial assignment passed to the 
SAT solver in the call. Since any valid estimate is at least the cost of the solution 
found (i.e., UBest > UB), hardening steps based on UBest can be justified by first 
deriving Oorig < UBest — 1, which follows from the latest objective-improving 
constraint (3a). However, in order to handle solutions correctly in the proof, the 
proof logging routines need to extract the solution found by the solver and com- 
pute the actual cost, which means that a Python-based solver will not be able 
to avoid this overhead when running with proof logging. 


Worked-Out Example. We end this section with a complete, worked-out example 
of OLL solving and proof logging for the toy MaxSAT instance (F,O) with 
formula F = {(b1 V £), (~x V b2), (b3 V b4)} and objective O = 5b; + 5b2 + b3 + ba. 

After initialization, the internal SAT solver of the OLL algorithm is loaded 
with the clauses of F and the proof consists of constraints (1)-(3) in Table 1. 
The OLL search begins by invoking the SAT solver on the clauses in F in order 
to check the existence of any solutions. Assume the SAT solver returns the 
solution a; assigning bı = b3 = b4 = 1 and b2 = x = 0. This solution has 
objective value O(a1) = Oprig(a1) = 7 so the algorithm updates UB = 7 and 
logs the objective-improving constraint (4) in Table 1 equivalent to Orig < 6. 

Assume the stratification bound Wstrat is initialised to 2. Then the solver is 
invoked with 6; = bə = 0 and returns the core Kı = bı + bg > 1, which is added 
to the proof as constraint (5). As already mentioned, core clauses are guaranteed 
to be RUP with respect to the set of clauses in the SAT solver database, which 
are also added to the proof. 

For simplicity, we ignore WCE and structure sharing in this example, mean- 
ing that the solver next reformulates the objective based on Ky, by introducing 
clauses enforcing yx, 2 <= (b1 +62 > 2) for the new counting variable yx, 2. This 
is done by (i) introducing the pseudo-Boolean constraints (6) and (7) in Table 1 
by reification, and (ii) deriving the clauses corresponding to these constraints. 
While the MaxSAT solver only uses the implication (6), the proof also requires 
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Table 1. Example proof produced by a certified OLL solver. 


id | Pseudo-Boolean constraint Justification 
(1) joi t+a>1 input 

(2) [b2+7>1 input 

(3) | bg +b4 >1 input 

(4) 5b, + 5b2 + b3 + b4 > 6 log solution aj 
(5) [bi +b2>1 RUP 

(6) [bi +b2+4K,,2 > 1 reification 

(7) (20k, 2 +b1 + be = 2 reification 

(8) | 5b + 5b2 + 59x, 2 = 10 (((5) + (7))/2)-5 
(9) [Bs tba + 80x, 9 > 8 (4) + 8) 
EET RUP 

(11) b3 +b4 > 1 RUP 

(12) | bs + ba + yKa2 > 1 reification 

(13) | 29g, 2 + b3 + ba > 2 reification 

(14) [bs + be Fiaa > 2 (G1) + (13))/2 
(15) | 5b1 + 5b2 + b3 + b4 > 7 log solution a2 
(16) | 5b1+5b2+b3+ba +59 xq, 2t+VKy,2 Z 12 (8) + (14) 

Gil fix,o+ tea 27 (15) + (16), 1 


constraint (7) corresponding to yx,.2 = (bı + bg > 2). Conveniently, in this 
toy example yx, 2 <= (bı + bg > 2) is already the clause bi + ba + YK,2 21, 
so step (ii) is not needed. For the general case, we derive totalizer clauses as 
explained in Sect. 4. Conceptually, we now replace 5b; + 5b2 by 5yx,,2 + 5 to 
obtain the reformulated objective Oef = b3 + b3 + 5yK,,2 + 5 with lower bound 
LB = 5. The core Kı says that at least one of bı and bə must be true, thus 
incurring a cost of 5, and yx,,2 is added to the objective to indicate if both of 
them incur cost. 

Since it now holds that coeff (Oper, yx,,2) + LB =54+5 > 7 = UB, the lit- 
eral yx,,2 is hardened to 0. In order to certify this hardening step, i.e., derive 
YK,,2 Z 1, the proof logger first derives the objective reformulation constraint 
5b, + 5b2 + b3 + b4 > b3 + b4 + 5yK,,2 + 5 enforced by line (8) in Table 1. 
The objective-improving and objective reformulation constraints are then added 
together to get constraint (9), after which Yj, 9 = 1 is obtained by a RUP step. 

The next SAT solver call with b3 = b4 = 0 returns as core the input clause 
bs + b4 > 1, and reformulation (lines (11)—(13)) yields Opep = 5yx,,2 + YK2,2 +6 
with LB = 6. Now suppose the SAT solver finds the solution ag with bg = b3 = 
x = 1 and all other variables set to 0, resulting in the objective-improving con- 
straint (15). Since Oorig(a2) = 6 = LB, the solver terminates and reports a2 to 
be optimal. To certify that this is correct, another objective reformulation con- 
straint (16) is derived, after which the contradictory constraint (17) is obtained 
by adding (15) and (16). This proves that solutions with cost less than 6 do not 
exist. 
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Fig. 1. Running time of CGSS with and Fig. 2. CGSS running time compared to 
without proof logging. time required for proof checking. 


5 Experimental Evaluation 


To evaluate the proof logging techniques developed in this paper, we have imple- 
mented them in the state-of-the-art MaxSAT solver CGSS [22,47], which uses 
the OLL algorithm and structure-sharing totalizers. We employed VERIPB [76], 
extended to parse MaxSAT instances in the standard WCNF format, to verify 
the certificates of correctness emitted by the certifying solver. 

Our experiments were conducted on machines with an 11th Gen Intel(R) 
Core(TM) i5-1145G7 @ 2.60 GHz CPU and 16 GB of memory. Each benchmark 
ran exclusively on a single machine with a memory limit of 14GB and a time 
limit of 3600s for solving with CGSS and 36000s for checking the certificates 
with VERIPB. As benchmarks we used all 594 weighted and 607 unweighted 
instances from the complete track of the MaxSAT Evaluation 2022 [61], where 
an instance (F,O) is unweighted if all coefficients coeff (O, £) are equal. The data 
from our experiments can be found in [12]. 


Overhead of Proof Logging. To evaluate the overhead in solver running time, we 
compared the standard CGSS solver [23] without proof logging (but with the 
bug fixes discussed below) to CGSS with proof logging as described in this paper. 
With proof logging 803 instances are solved within the resource limits, which is 
3 instances less than without proof logging (see Fig. 1). Adding proof logging 
slowed down CGSS by about 8.8% in the median over all solved instances. For 
95% of the instances CGSS with proof logging was at most 36.2% slower. Thus, 
the proof logging overhead seems perfectly manageable and should present no 
serious obstacles to using proof logging in core-guided MaxSAT solvers. 


Overhead of Proof Checking. To assess the efficiency of proof checking, we com- 
pared the running time of CGSS with proof logging to the time taken by 
VERIPB for checking the generated proofs. The instances that were not solved 
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Table 2. Illustration of discovered bug (where y;,, should be read as yx,,x)- 


#iter | Literals considered (Wstrat = 2) Core Kyiter extracted 
1 {bpe |4=1...5} Ki = yb > 1 
2 {ei |i =1...5}U {y1,2} K2 = y1,2 +e2 +e4 > 1 
3 {ei |i = 1...3,5} U {y1,2, Y1,3} U {y2,2} K; = y1 3 +61 +e2 +e5 > 1 
4 {ei | i = 1...3} U {y1,2, y1,4} U {y2,2, Y3,2} K4 = y2 +61 +e2>1 
5 {ei |i = 1...3} U {y1,4} U {y2,2, Y3,2, Y4,2} | Ks = e1 + e2 + e3 + y1,4 +Y22 > 1 
6 {e3} U {y1,5} U {y2,3 } U {ys,2, Y4,2, Y5,2} Result is SAT 

#iter Orer (after reformulation of K yiter) 

0 10(X 2; bi) + Lei + 14e2 + 1le3 + 3e4 + 2e5 + 01 + 02 

T 1leı + 14e2 + llez + 3e4 + 2e5 + 10y1,2 + 01 +02 + 10 

2 1leı + 1le2 + 1le3 + 2e5 + Ty1,2 + 3y1,3 + 3y2,2 + 01 + 02 + 13 

3 9e1 + 9e2 + 1le3 + 7y1,2 + y1,3 + 2y1,4 + 3y2,2 + 2y3,2 + 01 + 02 + 15 

4 2e1 + 2e2 + 1le3 + 8y1,3 + 2y1,4 + 3y2,2 + 2y3,2 + Ty4,2 + 01 + 02 + 22 

5 9e3 + 8y1,3 + 2y1,5 + yo,2 + 2y2,3 + 2y3,2 + Ty4,2 + 2y5,2 + 01 + 02 + 24 


by CGSS within the resource limits were filtered out, since the running time for 
checking an incomplete proof is inconclusive. 

VERIPB successfully checked the proofs for 747 out of the 803 instances 
solved by CGSS (see Fig. 2); 42 instances failed due to the memory limit and 14 
instances failed due to the time limit. Checking the proof took about 3 times the 
solving time in the median for successfully checked instances. About 87% of the 
successfully checked instances were checked within 10 times the solving time. 

Proof checking time compared to solver running time varies widely, but our 
experiments indicate that the performance of VERIPB is sufficient in most cases, 
and verification time scales linearly with the size of the proof for a majority of 
the instances. However, there is room to improve VERIPB, where focus so far has 
been on proof logging strength rather than performance. For the instances where 
checking is 100 times slower than solving, the main bottleneck is the proof gen- 
erated by the SAT solver, which could be addressed by standard techniques for 
checking DRAT proofs, and checking logged solutions (when objective improving 
constraints (3a) are added) could also be implemented more efficiently. 


Bugs Discovered by Proof Logging. Our work on implementing proof logging in 
CGSS led to the discovery of two bugs, which were also present in the solver 
RC2 on which CGSS is based, but have now been fixed in CGSS in com- 
mit 5526d04 and in RC2 in commit d0447c3. The bugs are due to a slightly 
different implementation of OLL compared to the description in Sect. 3. 

First, when a counting variable yx,,,,; for a core K iq appears for the first 
time in a later core Knew, the next counting variable yx,,,,141 is added to the 
reformulated objective with coefficient w (K mei Onew) rather than w (K old, Opa) - 
The coefficient of yx,,,,i41 is then further increased when yx,,,,; is found in 
future cores. Second, rather than computing the upper bound UB from an actual 
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solution, CGSS uses a weaker estimate UBest obtained by summing the current 
lower bound and the coefficients of all literals b where coeff (O;ef,b) < Wstrat 
(meaning that these literals were not set to 0 in the SAT solver call, and so 
could potentially be true in the solution). 

The bugs we detected could lead to the solver producing an overly optimistic 
estimate UBest < UB. The first way this can happen is when the contributions 
of counting variables yx, in the reformulated objective are underestimated due 
to too small coefficients. The second bug is when the coefficient of yx,,,,i41 iS 
first lowered below Wstrat and then raised above this threshold again when yx,,,i 
is found in a core. Then CGSS fails to assume yg,„4,i+1 = 0 in future solver calls. 
These bugs can result in erroneous hardening as detailed in the next example. 


Example 1. Given a MaxSAT instance (FO) with F = (v bi), (01 V 02) }U 
{bi Ve; |i=1,...,5} and O = ($}_; 10 -b:) +11 -e1 +14: e2 +11- e3 +3- e4 + 
2- e5 +01 + 02, assume the stratification bound is Wstrat = 2. Table 2 displays 
a possible CGSS run for this instance, except that for simplicity we assume 
one core extraction per iteration and no use of any other heuristics. The upper 
half of the table lists the variables set to 0 in solver calls, the extracted core, 
and the lower bound derived from it. The lower half of the table provides the 
reformulated objective. Even though the coefficient of yx,,3 is increased to 8 
after the fourth core, this variable is not set to 0 in subsequent iterations, which 
allows the solver to finish the stratification level after extracting 6 cores with a 
solution that sets to true the variables b1, b2, b3, bs, 4, 01, 02, YK,2 and yx, i for 
i = 1,...,4, and all other variables to false. The cost of this solution is 45. 

Now CGSS would incorrectly estimate UB... = LB +4 = 28, since yx, 3 
and yg,,2 (abbreviated as y;,3 and y2,2 in the table) both have coefficient 1 in 
the current reformulated objective. This is lower than the cost 45 of the solution 
found (and even than the optimum 36), and erroneously allows hardening— 
which considers yx,,3 with the correct coefficient 8—to fix yx,,3 = 0, even though 
by, bg and 63 (and hence also yx,,3) are true in every minimal-cost solution. 


In our computational experiments there were cases of faulty hardening, but 
all incorrectly fixed values happened to agree with some optimal solution and so 
we never observed incorrect results. Proof logging detected the problem, however, 
since the derivations of the buggy hardening steps failed during proof checking. 
Interestingly, what proof logging did not turn up was any examples of mistaken 
claims Oprig < UBest — 1 when the cost of a found solution was estimated. The 
issue with mistaken estimates due to faulty stratification was instead discovered 
while analyzing and fixing the hardening bug. The moral of this is that even 
if all results are certified as correct, this does not certify that the code is free 
from bugs that have not yet manifested themselves. However, proof logging still 
guarantees that even if the solver would have undiscovered bugs, we can always 
trust computed results for which the accompanying proofs pass verification. 
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6 Concluding Remarks 


In this work, we develop pseudo-Boolean proof logging techniques for core-guided 
MaxSAT solving and implement them in the solver CGSS [47] with support 
for the full range of sophisticated reasoning techniques it uses. To the best of 
our knowledge, this is the first time a state-of-the-art MaxSAT solver has been 
enhanced to output machine-verifiable proofs of correctness. We have made a 
thorough evaluation on benchmarks from the MaxSAT Evaluation 2022 using the 
VERIPB proof checker [17,42], and find that proof logging overhead is perfectly 
manageable and that proof verification time, while leaving room for improvement, 
is definitely practically feasible. Our work also showcases the benefit of proof 
logging as a debugging tool—erroneous proofs produced by CGSS revealed two 
subtle bugs in the solver that previous extensive testing had failed to uncover. 

Regarding proof verification time, further investigation is needed into the rare 
cases where verification is much slower (say, more than a factor 10) than solving. 
There are reasons to believe, though, that this is not a problem of MaxSAT proof 
logging per se, but rather is explained by features not yet added to VERIPB, 
which is a tool currently undergoing very active development. So far, the proof 
checker has been optimized for other types of reasoning than the clausal reverse 
unit propagation (RUP) steps that dominate SAT proofs. Also, VERIPB lacks 
the ability to trim proofs during checking as in [44]. Finally, introducing a binary 
proof format in addition to plain-text proofs would be another way to boost 
performance of proof checking. But these are matters of engineering rather than 
research, and can be taken care of once the proof logging technology as such has 
been developed and has proven its worth. 

The focus of this work is on core-guided MaxSAT solving, but we would like 
to extend our techniques to solvers using linear SAT-UNSAT (LSU) solving (such 
as PACOSE [68]) and implicit hitting set (IHS) search (such as MAXHS [28, 29]). 
Although there are certainly nontrivial technical challenges that will need to be 
overcome, we are optimistic that our work paves the way towards a unified proof 
logging system for the full range of modern MaxSAT solving approaches. Going 
beyond MaxSAT, it would also be interesting to extend VERIPB proof logging 
to pseudo-Boolean solvers using core-guided search [30] or IHS [73,74], and per- 
haps even to similar techniques in constraint programming [36] and answer set 
programming [5]. 
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Abstract. Classically, in saturation-based proof systems, unification 
has been considered atomic. However, it is also possible to move unifica- 
tion to the calculus level, turning the steps of the unification algorithm 
into inferences. For calculi that rely on unification procedures returning 
large or even infinite sets of unifiers, integrating unification into the cal- 
culus is an attractive method of dovetailing unification and inference. 
This applies, for example, to AC-superposition and higher-order super- 
position. We show that first-order superposition remains complete when 
moving unification rules to the calculus level. We discuss some of the 
benefits this has even for standard first-order superposition and provide 
an experimental evaluation. 


1 Introduction 


Unification is a key feature in many proof calculi, particularly those based on 
the saturation framework. It acts as a filter, reducing the number of inferences 
that need to be carried out by instantiating terms only to the degree necessary. 
However, many unification algorithms have large time complexities and produce 
large, or even infinite, sets of unifiers. This is the case, for example, for AC- 
unification, which can produce a doubly exponential number of unifiers [10], and 
higher-order unification, which can produce an infinite set of unifiers [20]. This 
motivates the study of how unification rules can be integrated into proof calculi 
to allow them to dovetail with standard calculus rules. One way to achieve this 
is to use the concept of unification with abstraction [13,17]. The general idea 
is that during the unification process, instead of solving all unification pairs, 
certain pairs are retained and added to the conclusion of an inference as negative 
constraint literals. Calculus-level unification inferences then work on such literals 
to solve these constraints and remove the literals in the case they are unifiable. 
Note how this differs from constrained resolution-style calculi such as [4,15] 
where the constraints are completely separate from the rest of the clause and 
are not subject to inferences. 

To demonstrate the idea of dedicated unification inferences in combination 
with unification with abstraction, we provide the following example. 


Ci = f(g(a,x)) Bt  C2= f(g(a,b)) st 
© The Author(s) 2023 
B. Pientka and C. Tinelli (Eds.): CADE 2023, LNAI 14132, pp. 23—40, 2023. 
https: //doi.org/10.1007/978-3-031-38499-8_2 
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A standard superposition calculus would proceed by unifying f(g(a,b)) and 
f(g(a, x) with the unifier o = {x — b} and then rewriting Cı with C2 to derive 
to # to. Equality resolution on to % to would then derive L. It is also possible to 
proceed by rewriting Cı with C2 without computing o and instead add the con- 
straint literal g(a, x) % g(a, b) to the conclusion to derive t # t V g(a, x) Æ g(a, b). 
A dedicated unification inference could then decompose the constraint literal 
resulting int Æ% t V a% a V bæ% x. Further unification inferences could bind x 
to b, and remove the trivial pairs a # a and t æ% t to derive L. 

In this paper, we investigate moving unification to the calculus level for stan- 
dard first-order superposition. Whilst this may seem like a regressive step, as we 
lose much of unification’s power to act as a filter on inferences and hence produce 
many more clauses, we think the investigation is valuable for two reasons. 

Firstly, by showing how syntactic first-order unification can be lifted to the 
calculus level, we provide a roadmap for how more complex unification problems 
can be lifted to the calculus level. This may prove particularly useful in the 
higher-order case, where abstraction may expose terms to standard calculus rules 
that were unavailable before. Moreover, we note that in our calculus we do not 
turn the entire unification problem into a constraint, but rather a subproblem. 
Whilst this may be merely an interesting detail for first-order unification, for 
more complex unification problems, such a method could be used to eagerly 
solve simple unification subproblems whilst delaying complex subproblems by 
adding them as constraints. 

Secondly, one of the most expensive operations in first-order theorem provers 
is the maintenance of indices. Indices are crucial to the performance of modern 
solvers, as they facilitate the efficient retrieval of terms unifiable or matchable 
with a query term. However, solvers typically spend a large amount of time 
inserting and removing terms from indices as well as unifying against terms 
in the indices. This is particularly the case in the presence of the AVATAR 
architecture [24] wherein a change in the model can trigger the insertion and 
removal of thousands of terms from various indices. By moving unification to 
the calculus level, we can replace complex indices with simple hash maps, since 
to trigger an inference we merely need to check for top symbol equality and not 
unifiability. Insertion and deletion become O(1) time operations. However, for 
first-order logic, we do not expect the time gained to offset the downsides of 
extra inferences carried out and extra clauses created. Our experimental results 
back up this hypothesis (see Sect. 7). Our main contributions are: 


m Designing a modified superposition calculus that moves unification to the 
calculus level (Sect. 3). 

m Proving the calculus to be statically and dynamically refutationally complete 
(Sect. 5). 

m Providing a thorough empirical evaluation of the calculus (Sect. 7). 


2 Preliminaries 


Syntax. We consider standard monomorphic first-order logic with equality. We 
assume a signature consisting of a finite set of (monomorphically) typed function 
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symbols and a single predicate, equality, denoted by ~. A non-equality atom 
A can be expressed using equality as A ~ T where T is a special function 
symbol [18]. Terms are formed in the normal way from variables and function 
symbols. We commonly use s, t or u or their primed variants to refer to terms. 
We write s : 7 to show that term s has type T. A term is ground if it contains 
no variables. We use the notation $,, to refer to a tuple or list of terms of length 
n. More generally, we use the over bar notation to refer to tuples and lists of 
various objects. Where the length of the tuple or list is not relevant, we drop the 
subscript. By s; we denote the ith element of the tuple Sn. Literals are positive 
or negative equalities written as s ~ t and s # t respectively. We use s¥t to 
refer to either a positive or a negative equality. Clauses are multisets of literals. 
A clause that contains no literals is known as the empty clause and denoted L. 

A substitution is a mapping from variables to terms. We assume, w.l.o.g., 
that all substitutions are idempotent. We commonly denote substitutions using 
o and 0 and denote the application of a substitution ø to a term s by so. 
A substitution 0 is grounding for a term s, if sð is ground. The definition of 
grounding substitution can be extended to literals and clauses in the obvious 
manner. A substitution ø is a unifier of terms s and t if so = to. A unifier ø is 
more general than a unifier o’ if there exists a substitution p such that op = o’. 
With respect to syntactic first-order unification, if two terms are unifiable then 
they have a single most general unifier up to variable naming [1]. 

A transitive irreflexive relation over terms is known as an ordering. The 
superposition calculus we present below is, as usual, parameterised by a simpli- 
fication ordering on ground terms. An ordering > is a simplification ordering, if 
it possesses the following properties. It is total on ground terms. It is compatible 
with contexts, meaning that if s > t, then u[s] > uft]. It is well-founded. Note 
that every simplification ordering has the subterm property. Namely, that if t 
is a proper subterm of s, then s > t. For non-ground terms, the only property 
that is required of the ordering is that it is stable under substitution. That is, if 
s > t then for all substitutions 0, so > to. We extend the ordering > to literals 
in the standard fashion via its multiset extension. A positive literal s ~ s’ is 
treated as the multiset {s,s’}, whilst a negative literal s % s’ is treated as the 
multiset {s, s,s’, s'}. The ordering is extended to clauses by its two-fold multiset 
extension. We use > to denote the ordering on terms and its multiset extensions 
to literals and clauses. 


Semantics. An interpretation is a pair (U,J), where U is a set of typed universes 
and J is an interpretation function, such that for each function symbol f : 7 x 
-++X Tp, — T in the signature, J(f) is a concrete function of type Un x- x Ur, > 
U,. A valuation € is a function that maps each variable x : r to a member of 
U,. For a given interpretation M and valuation £, we uses iels, to represent the 
denotation of t in M given €. A positive literal s ~ t is true in an interpretation 
M for valuation € if [s]5, E [415,, and false otherwise. A negative literal s % t 
is true in an interpretation M for valuation € if s ~ t is false. A clause C holds 
in an interpretation M for valuation € if one of its literals is true in M for £. 
An interpretation M models a clause C if C holds in M for every valuation. An 
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interpretation models a clause set, if it models every clause in the set. A set of 
clauses M entails a set of clauses N, denoted M | N, if every model of M is 
also a model of N. 


3 Calculus 


Intuitively, what we are aiming for with our calculus, is that whenever standard 
superposition applies a substitution ø to a conclusion with the side condition 
“g is a unifier of terms tı and tg”, our calculus adds a constraint tı % tg to the 
conclusion. The calculus then has further inference rules that mimic the steps of a 
first-order unification algorithm and work on negative literals. Our presentation 
below does not quite follow this intuition. Instead, if the unification problem is 
trivial we solve it immediately. If it is non-trivial, we carry out a single step of 
unification and add the resulting sub-problems as constraints. Our reasons for 
doing this are two-fold. 


1. Adding the entire unification problem tı Æ t2 as a constraint can lead to a 
constraint literal that is larger, with respect to >, than any literal occurring 
in the premises. This causes difficulties in the completeness proof. 

2. More pertinently, keeping in mind our planned applications to more complex 
logics, we wish to show that delayed unification remains complete even when 
only selected sub-problems of the original unification problem are added as 
constraints. In the context of higher-order logic, for example, this could allow 
for the eager solving of simple unification sub-problems whilst only the most 
difficult are added as constraints. See Sect. 6 for further details. 


Wherever we present a clause as a subclause C” and a literal | (e.g. C’ V1), we 
denote the entire clause by the same name as the subclause without the dash (e.g. 
we refer to the clause C” V l by C). As in the classical superposition calculus, 
our calculus is parameterised by a selection function that is used to restrict 
the number of applicable inferences in order to avoid the search space growing 
unnecessarily. A selection function sel is a function that maps a clause to a subset 
of its negative literals. We say that literal | is o-eligible in a clause C” V 1 if it is 
selected in C (l € sel(C)), or there are no selected literals and lo is maximal in 
Co. Strict o-eligibility is defined in a like fashion, with maximality replaced by 
strict maximality. Where o is empty, we sometimes speak of eligibility instead 
of o-eligibility. In what follows, CS is a multiset of literals that we refer to as 
constraints. 


DV Fn SE CV sl[fGr) es" 
C'V D'V st] &s! V CS 


SUP 


DVart CVslfS)|s' 
(C V D' V s[t']&8')o 
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where o = {x —> f(S,)}, and CS = tı # 81 V... V tn Æ Sn. Both rules share 
the following side conditions. Let t stand for either f(t,) or x. For Sup, the 
substitution o mentioned in the side conditions is of course empty. 


a txt’ is strictly o-eligible. 
a s|f(5n)| Š s is strictly o-eligible if positive and o-eligible if negative. 
a to £ to and s[f(5,)lo A s'o. 
a Co £ Do 
7 vy Ro asl = ~ 
OV Flin) HY a fln) #0 EQFAcT C Vu VuU _ vRoror 
C' Vu BUN f (Bn) SUV CS ("Vu gu Vu x vio 


for EQFACT, CS = ti # s1 V... V tn Æ Sn. For VEQFACT, either u or u’ must 
be a variable and ø is the most general unifier of u and u’. The side conditions 
for EQFACT are: 


a f(5„) =~ v be eligible in C. 
a f(8n) Av and f(tn) Kv’. 


The side conditions for VEQFACT are: 


m u X v be o-eligible in C. 
m uc Avo and u'o £ vo. 


The calculus also contains the following resolution /unification inferences. We 
refer to these as unification inferences, because each inference represents carrying 
out a single step of the well-known Robinson unification algorithm [11]. 


OV FCn) # Fln) DECOMPOSE 
C'V CS 
C! V t OV 
ore BIND KALAS REFLDEL 


where for BIND, o = {x — t} and x does not occur in t. For DECOMPOSE, 
f(8n) # f(in) and CS = tı #51 V...Vtn Æ Sn. All three inferences require that 
the final literal be o-eligible in Co (for DECOMPOSE and REFLDEL, ø is empty). 
We provide some examples to show how the calculus works. 


Example 1. Consider the unsatisfiable clause set: 
Cı = f(x, g(x) Ft Cr= f(g(),y) st 


A Sup inference between Cı and Cə results in clause C3 = t #tVa # 
g(b) V g(a) % y. A REFLDEL inference on C3 results in the clause Cy = x #% 
g(b) V g(x) % y. An application of BIND on C4 with o = {x — g(b)} results in 
Cs = g(g(b)) Æ y. Another application of BIND, then leads to L. 
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Example 2. Consider the unsatisfiable clause set: 


Ci=rxce Cy = f(a,c) #t C3 = f(cc) xt 


A VSup inference between C4 and Co results in clause Cy = f(c,c) ®t. A 
SUP inference between C3 and C4 results in the clause Cs = t €tVeH#ceVcH#ec. 
A triple application of REFLDEL starting from Cs derives L. 


Note 1. We abuse terminology and use inference and inference rule to refer both 
to schemas such as shown above, as well as concrete instances of such schemas. 
Given an inference 1, we refer to the tuple of its premises by prems(z), to its 
maximal premise by mprem(t), and to its conclusion by concl(z). 


4 Redundancy Criterion 


We utilise Waldmann et al.’s framework [25] for proving the completeness of 
our calculus. Hence, our redundancy criterion is based on their intersected lifted 
criterion. In instantiating the framework, we roughly follow Bentkamp et al. [6]. 
Let the calculus defined above be referred to as Inf. We introduce a ground 
inference system GJnf that coincides with standard superposition [3]. That is, 
it contains the well known three inferences, SUP, EQFACT and EQRES. We refer 
to these inferences by GSuP, GEQFACT and GEQRES to indicate that they are 
only applied to ground clauses. Following the notation of the framework, we write 
Inf (N) (GInf(N)) to denote the set of all Inf (GInf) inferences with premises 
in a clause set N. We introduce a grounding function G that maps terms, literals 
and clauses to the sets of their ground instances. For example, given a clause C, 
G(C) is the set {C0 | 0 is a grounding substitution}. We extend the function G 
to clause sets by letting G(N) = Ucen G(C) where N is a set of clauses. 

A ground clause C is redundant with respect to a set of ground clauses N 
if there are clauses C1,...,Cn E N such that for 1 < i < n, Ci < C and 
Ci, ...,Cn H| C. The set of all ground clauses redundant with respect to a set 
of ground clauses N is denoted GRedcı( N). 

A clause C is redundant with respect to a set of clauses N, if for every 
D € G(C), D is redundant with respect to G(N) or there is a clause C’ € N 
such that D € G(C’) and C 3 C’ where J is the strict subsumption relation. 
That is C 3 C’ if C is subsumed by C”, but C” is not subsumed by C. The set 
of all clauses redundant with respect a set of clauses N is denoted Redci(N). 

In order to define redundant inferences, we have to pay careful attention to 
selection functions. For non-ground clauses, we fix a selection function sel. We 
then let G(sel) be a set of selection functions on ground clauses with the following 
property. For each gsel € G(sel), for every ground clause C, there exists a clause 
D such that C € G(D) and the literals selected in C by gsel correspond to those 
selected in D by sel. We write GInf9*“' to show that the ground inference system 
GInf is parameterised by the selection function gsel. Let 4 be an inference in Inf. 
We extend the grounding function G to a family of grounding functions G9°“ 
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for each gsel € G(sel). Each function G9**! maps terms, literals and clauses as 
above, and maps members of Inf to subsets of GInf 9°” as follows.! 


Definition 1 (Ground Instance of an Inference). Let ı be of the form 
Cy,...,Cr F E V CS. An inference tg €E GInf 8 is in G9"() if it is of the 
form C10,...,Cn0 F E@ for some grounding substitution 0. In this case, we say 
that tg is the @-ground instance of ı. Note that we ignore the constraints in the 
definition of ground instances. 


A ground inference Cj,...,C,,C F E with maximal premise C is redundant 
with respect to a clause set N if for 1 < i < n, Ci E€ GRedcı( N) or C € 
GRedcı(N) or there exist clauses D1,...Dm E N such that for 1 < i < m, 
D; < C and Dy,...,Dm - E. The set of all ground inferences redundant with 
respect to a set N is denoted GRed 35° (N). 

An inference ų is redundant with respect to a clause set N if for every gsel € 
G(sel) and for every i! € G9**"(v), 1! € GRed$*"'(G(N)). In words, every ground 
instance of the inference is redundant with respect to G(N). We denote the set 
of all redundant inferences with respect to a set N as Red;(N). 

A clause set N is saturated up to redundancy by an inference system Inf if 
every member of Inf (N) is redundant with respect to N. 


Note 2. Given the definition of clause redundancy above, the REFLDEL infer- 
ence can be utilised as a simplification inference. That is, the conclusion of the 
inference renders the premise redundant. 


5 Refutational Completeness 


To prove refutational completeness we utilise the above mentioned framework of 
Waldmann et al. [25]. In particular, we use Theorem 14 from the paper to lift 
completeness from the ground level to the non-ground level. We bring Theorem 
14 here for clarity and to keep the paper self contained. We then present it in 
our notation. Let GRed = (GRed$*", GRedc1) and Red = (Red, Redc1). 


Theorem 14 (from Waldmann et al. [25]). If (Ginf?, Red‘) is statically 
refutationally complete w.r.t. 1 for every q E€ Q and if for every N C F that 
is saturated w.r.t. FInf and Red there exists a q such that GInf%(G4(N)) C 
G4(FInf (N))URed!(G4(N)), then (FInf, Red") is statically refutationally com- 
plete w.r.t. =g- 


Theorem 14 (from Waldmann et al. in our Notation). If 
(GInf 9", GRed) is statically refutationally complete w.r.t. = for every gsel € 
G(sel) and if for every clause set N that is saturated w.r.t. Inf and Red there 
exists a gsel such that GInf%*'(G9%!(N)) C G9! (Inf (N)) U Red;(G9*"(N)), 
then (Inf , Red) is statically refutationally complete w.r.t. =ç. 


1 When a grounding function G9°® acts on a clause, literal or term, we commonly 
drop the gsel superscript as the selection function plays no role in the grounding of 
these. 
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Thus, in our context, the set Q is G(sel), the ground inference system GInf 4 
maps to GInf%*', the ground redundancy criterion Red is (GRed!*“, GRedc1) 
and the ground entailment relation |= maps to standard entailment on first- 
order clauses. Moreover, the non-ground inference system FInf maps to Inf and 
the redundancy criterion Red"9 maps to (Redz, Redo). Note, that this final 
mapping is not exact, as the criterion Red does not allow for a tiebreaker 
ordering, such as the strict subsumption relation, to be utilised in the definition 
of non-ground redundancy. However, this mismatch can easily be repaired since 
Theorem 16 of the framework paper extends the result of Theorem 14 to the 
case where tiebreaker orderings are used. 

As our ground inference systems GInf °°! are ground superposition systems, 
static refutational completeness with respect to standard entailment and stan- 
dard redundancy is a famous result. See for example [2]. What remains for us to 
prove in order to apply Theorem 14 and show the static refutational complete- 
ness of Inf, is: 


1. For every gsel € G(sel), the grounding function G9** is a grounding function 
in the sense of the framework. 

2. For every clause set N saturated up to redundancy by Inf, there exists a 
gsel € G(sel) such that GInf 9° (G(N)) € G9! (Inf (N)) U GRed$*"'(G(N)). 
In words, there exists a ground selection function such that every ground 
inference with that selection function and premises in G(N) is either the 
instance of a non-ground inferences with premises in N or is redundant with 
respect to G(N). 


Lemma 1. For every gsel € G(sel), the grounding function G9°! is a grounding 
function in the sense of the framework. 


Proof. We need show that properties (G1) — (G3) defined by Waldmann et al. 
hold for grounding functions. These properties are: 


(G1) for every L E€ F1, 0# G(L) CG; 
(G2) for every C € F, if L € G(C) and L € (G), then C € F,; 
(G3) for every ı € FInf, if G(1) # undef, then G(1) C Reds(G(concl(e))). 


As properties (G1) and (G2) relate to the grounding of terms and clauses, 
and our grounding of these is fully standard we skip these. We prove (G3), 
which in our terminology is: for every 1 € Inf, G9 (1) C GRed$*“"'(G(conel())). 
This can be achieved by showing that for every v’ € G9°¢!(z), there exist clauses 
C € G(conel(t)) such that C | conel(c’) and for each C; € C, Ci < mprem(u’). 
In what follows, let 8 be the substitution by which +’ is a grounding of ¢. 

If CS is the empty set in concl(z), then concl(t)@ = concl(t') and hence 
concl(t)@ = concl(u'). Moreover, concl(t)@ E€ G(concl(v)) and thus concl(t)6 < 
mprem(t’). 

On the other hand, if CS is not empty, let u = f (tn) and u’ = f (5n) be the two 
terms within prems(z) from which the constraints are created. By the existence 
of ı', we have that u@ = u’@, and hence that t; = s;0 for 1 <i < n. Hence, every 


Superposition with Delayed Unification 31 


literal in CS0 has the form t # t and is trivially false in every interpretation. 
Thus, we still have concl(z)@ — concl(s’). Moreover, by the subterm property 
of the ordering > we have that t;0 # s;0 is smaller than the maximal/selected 
literal of mprem(c’) for 1 < i < n and hence that concl(z)0 < mprem(1’). 


Lemma 2. let a be the most general unifier of terms s and s', and 0 be any 
unifier of the same terms. Then for any term t, (to)0 = t0. 


Proof. Since ø is the most general unifier, there must be a substitution p such 
that op = 0. Hence (to)@ = (ta)op = top = tO where the second to last step 
follows from the fact that ø is idempotent. 


Lemma 3. For every clause set N saturated by Inf, there exists a gsel € G(sel) 
such that GInf %' (G(N)) C G9! (Inf (N)) U GRed$*"(G(N)). 


Proof. For every D € G(N) there must exist a clause C € N such that D € 
G(C). Let > be an arbitrary well-founded ordering on clauses. We let C = 
G~‘(D) denote the >>-smallest clause such that D € G(C). We then choose the 
gsel € G(sel) that for a clause D € G(N) selects the corresponding literals to 
those selected by sel in G~'(D). Given this gsel, we need to show that every 
inference with premises in G(V) is either the ground instance of an inference 
with premises in N, or is redundant with respect to G(N). 

A SUP inference is redundant if the term t replaced in the second premise 
occurs at or below a variable. The proof is exactly the same as in the standard 
proof of the completeness of superposition [3], so we don’t repeat it. All other 
inferences can be shown to be the ground instance of inferences from clauses 
in N. 

Let ı € GInf’™® be the following GSup inference with premises in G(N). 


DOvto~t'O COV solto] &s'0 
COV D'O V slt o] & s'0 


where G-1(D@) = D = D' V t x t, G71(C0) = C = C V s&s’ and ı fulfils 
all the side conditions of GSuP. Let o be any substitution. The literal t0 ~ t’0 
being strictly maximal in D9 implies that to ~ t'o is strictly maximal in Do due 
to the stability under substitution of >. The literal s6[t@]  s’0 being (strictly) 
eligible in C0 with respect to gsel implies that so ~ s'o is strictly eligible in 
Co with respect to sel. Let p be the position of t@ within s0 and let u be the 
subterm of s at p. Since the term t0 does not occur below a variable of C, such 
a position must exist. Moreover, u cannot be a variable since if it was t@ would 
occur at a variable of C. As @ is a unifier of u and t, it must be the case that 
either t is a variable, or u and t have the same top symbol. Further, DO < C0 
implies that Co £ Do, t0 > t'0 implies that to £ t'o, and s6[t’6] > s’@ implies 
so £ s'o. Thus, if t is not a variable, there exists the following SUP inference v’ 
from clauses D and C. 


Dvtat C v sju] xs’ 
C v D' v st] =s V CS 
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We have that (C’ V D’ V s[t'] © s')6 = concl(v). That is, the grounding of the 
conclusion of z’ less the constraint literals is equal to the conclusion of 1. Thus, + 
is the 6-ground instance of i’ as per Definition 1. If t is a variable x, then there 
exists the following VSupP inference v’ from clauses D and C. 


Dvart C’ V slu] <s’ 
(Cœ v D' v s{t'|&s')o 


where o = {x — u} is the most general unifier of t and u. Thus, we can use 
Lemma 2 to show that concl(t’)@ = concl(¿) and again v is the 6-ground instance 
of v’. 

Let ı € GInf %*“' be the following GEQFACT inference with premise in G(N). 


COV whx Ub vb 
COV vô æ% vO V ub & vd 


where u’9 = uf, G-1(C0) = C = C' Vw ~v V u ~ v and fulfils all the side 
conditions of GEQFact. Let o be any substitution. The literal u@ ~ v0 being 
maximal in D@ implies that uo ~ vo is maximal in Do. Since 0 is a unifier of u’ 
and u, at least one of them must be a variable, or they must share a top symbol. 
Moreover, uf > v implies that uo £ vo and u’@ > v’@ implies that u'o £ vo. 
If neither u nor u’ is a variable, there exists the following EQFACT inference 0’ 
from C. 


C Vu eu Vue 
CV vu gu Vurvv CS 


We have (C’ Vu % v’ V u ~ v) = concl(c), making ų the 0-ground instance 
of v’ as per Definition 1. If either u of 'u is a variable there exists the following 
VEQFAacT inference v’ from C. 


C Vu su Vuru 
("Vues Vurvio 


where ø is the most general unifier of u and u’. Thus, we can use Lemma 2 
to show that concl(.’)@ = concl(). Finally, let ı € GiInf%*' be the following 
GEQRES inference with premise in G(V). 


COV s0 # s'0 
(047 
where s0 = s'0, G-1(C0) = C = C' V s % s' and ı fulfils all the side conditions 
of GEQRES. Let o be any substitution. The literal s % s'0 being eligible with 
respect to gsel in C@ implies that s % s’ is eligible in C with respect to sel. Since 
6 is a unifier of s and s’, at least one of them must be a variable, or they must 


share a top symbol. If s = s’, then there exists the following REFLDEL inference 
v from C. 
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C'Vs#s 
C’ 


Otherwise we have two options. If either s (or analogously s’) is a variable, 
then there is the following BIND inference 1’ from C. 


Vas! 
C'a 


Otherwise s and s’ must share a top symbol and there is the following 
DECOMPOSE inference c’ from C. 


C'V fn) # Flin) 
C'v CS 


In the first case, we have concl(u’)@ = concl(z). In the second case, ø is 
the most general unifier of s and s’, so we can use Lemma 2 to show that 
concl(t’)@ = concl(v). In the last case, we have that C’@ = concl(v). Thus in all 
cases, ¿ is the 0-ground instance of v’. 


Using Lemmas 1 and 3 we can instantiate Theorem 14 to prove the static 
refutational completeness of Inf. There is a slight issue here, as Theorem 14 
gives us refutational completeness with respect to Herbrand entailment. That is 
N E M if G(N) H G(M). We would like to prove completeness with respect 
to entailment as defined in Sect. 2 (known as Tarski entailment). This issue can 
easily be resolved by showing that the two concepts are equivalent with regards 
to refutations which can be achieved in a manner similar to Bentkamp et al. 
(Lemma 4.19 of [6]). 


Theorem 1 (Static refutational completeness). For a set of clauses N 
saturated up to redundancy by Inf, N = L if and only if LEN. 


Theorem 17 of Waldmann et al.’s framework can be used to derive dynamic 
refutational completeness from static refutational completeness. We refer readers 
to the framework for the formal definition of dynamic refutational completeness. 


Theorem 2 (Dynamic refutational completeness). The inference system 
Inf is dynamically refutationally complete with respect to the redundancy crite- 


rion (Redz, Rede). 


6 Extending to Higher-Order Logic 


We sketch how the ideas above can be extended to higher-order logic. This is 
ongoing research, and many of the technical details have yet to be fully worked 
out. Here, we provide a (very) informal description and then provide exam- 
ples. The higher-order unification problem is undecidable and there can exist a 
potentially infinite number of incomparable most general unifiers for a pair of 
terms [12]. Existing higher-order paramodulation style calculi deal with this issue 
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in two main ways. One method is to abandon completeness and only unify to 
some predefined depth [22]. Another approach is to produce potentially infinite 
streams of unifiers and interleave the fetching of items from such streams with 
the standard saturation procedure [7]. Our idea is to solve easy sub-problems 
eagerly, such as when terms are first-order or in the pattern fragment [16], and 
add harder sub-problems as constraints. We then utilise dedicated inferences 
on negative literals to mimic the rules of Huet’s well known (pre-)unification 
procedure [12]. We think that inferences similar to the following two, could be 
sufficient to achieve refutational completeness. 


O'V EEn É ftm 


(C'V £3n Æ ftm) {£ > Agn- f (21 Yn) (m Yn) } IMITATE 


C'V EEn % fim 
(C'V 23n Æ fima > Agn- Yi (21 Tn) --- (Zp Tn) } 


PROJECT 


In both rules, each z; is a fresh variable of the relevant type, and £ 3n % f tm is 
selected in C. PROJECT has k < n conclusions, one for each y; of suitable type. 
We hope that through a careful definition of the selection function, along with 
the use of purification, we can avoid the need to apply unification inferences 
to flex-flex literals (negative literals where both sides of the equality have vari- 
able heads). Moreover, we are hopeful that the calculus we propose can remain 
complete without the need for inferences that carry out superposition beneath 
variables such as the FLUIDSUP rule of -superposition |7] and the SUBVARSUP 
rule of combinatory-superposition [9]. 


Example 3. Consider the unsatisfiable clause set: 


Cı = fy (xa) (ab) #t C2 = fcabat 


A Sup inference between Cı and Ch results in clause C3 = to # to V za % 
aV xb æ b where o = {y — c}. Assume that the literal xa is selected in C3. 
We can carry out either a PROJECT step on this literal or an IMITATE step. The 
result of a project step is Cy = (to # to V (Az.z)a % aV rb # b){x > àz.z}. 
Applying the substitution and -reducing results in Cs = to # to Va % aVb æ b 
from which it is easy to reach a contradiction. 


Example 4 (Example 1 of Bentkamp et al. |7|). Consider the unsatisfiable clause 
set: 


Ci=faxc C= h(yb) (ya) #h(g(fb)) (ge) 


An EQRES inference on C2 results in C3 = yb  g(fb) Vya # gc. An IMITATE 
inference on the first literal of C3 followed by the application of the substitution 
and some /-reduction results in C4 = g (zb) # g(fb)V g(za) # gc. A further 
double application of EQRES gives us C5 = zb # fbV za % c. We again 
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carry out IMITATE on the first literal followed by an EQRES to leave us with 
Ce = xb æ bV f (xa) æ% c. We can now carry out a SUP inference between C1 
and Ce resulting in C7 = xb #bVc#cV «a æ% a from which it is simple to 
derive L via an application of IMITATE on either the first or the third literal. 
Note, that the empty clause was derived without the need for an inference that 
simulates superposition underneath variables, unlike in [7]. 


Example 5 (Example 2 of Bentkamp et al. |7|). Consider the unsatisfiable clause 
set: 


Ci=farc Ca = h (y (Ax.g (f x)) a)y Æ h (gc) (Aw z. w x) 


An EQRES inference on C2 results in C3 = y (Ax. g (f x))a Æ gcVy Awe. we. 
Assuming that the second literal is selected,? an EQRES inference results in 
C4 = (y (Ac. g(fx))a # ge{y — Awaz. we}. Simplifying C4 via applying the 
substitution and G-reducing, we achieve g (fa) % gc. Superposing Cı onto this 
clause we end up with Cs = gc Æ% gc from which the empty clause can easily be 
derived. Note again, that the empty clause has been derived without recourse to 
a FLUIDSUP-like inference. 


7 Experimental Results 


We implemented the calculus in the Vampire theorem prover [14]. We also imple- 
mented a variant of the calculus, that utilises fingerprint indices [19] to act as an 
imperfect filter. The completeness proof indicates that a superposition inference 
only needs to be carried out when the two terms can possibly unify. Therefore, 
we store terms in fingerprint indices, which act as fast imperfect filters for find- 
ing unification partners, and only carry out superposition inferences with terms 
returned by the index. This restricts, somewhat, the number of inferences that 
take place, at the expense of some loss of speed. Thus, it represents a mid- 
way path between eager unification and delayed unification. As a final twist, we 
implemented a version of the calculus that uses fingerprint indices as well as 
solving constraint literals of the form x # t (where x is not a subterm of t) and 
t Æ% t eagerly. Thus, in this version of the calculus there is no need for the BIND 
and REFLDEL rules. 

We compared each of these approaches with the standard superposition cal- 
culus implemented in Vampire. We refer to the standard calculus as VAMPIRE 
and the delayed inference calculus without fingerprint indices by VAMPIRE*.? 
We refer to the delayed inference calculus with fingerprint indices by VAMPIRE). 


? Most orderings would select the first literal. In this case, we can still derive a con- 
tradiction, but the proof is longer. 

3 Our implementation can be found at https://github.com/vprover/vampire/tree/ 
delayed-unification. To run the new calculus, use option -duc on. To run the stan- 
dard calculus, the option duc is set to off. 
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Finally, we refer to the calculus that eagerly solves some constraint literals by 
VAMPIRE*.4 

We tested these approaches against each other on benchmarks coming from 
CASC 2023 system competition [23]. As our new approach is not currently com- 
patible with higher-order or polymorphic input, we restricted the comparison to 
monomorphic first-order problems. Namely, we used the 500 benchmarks in the 
FNE and FEQ categories. These are monomorphic, first-order benchmarks that 
either include equality (FEQ) or do not contain equality (FNE). All benchmarks 
in the set are theorems. The results can be seen in Table 1. All experiments were 
run on a node cluster located at The University of Manchester. Each node in the 
cluster is equipped with 192 gigabytes of RAM and 32 Intel® Xeon processors 
with two threads per core. Each configuration was given 100s of CPU time per 
problem and run in single core mode. VAMPIRE was run with options --mode 
casc which causes it to use a tuned portfolio of strategies. All other variants 
were run with options --mode casc --forced_options duc=on which forces 
the use of the new calculus on top of the aforementioned portfolio. 


Table 1. Summary of experimental results 


Approach. | Solved | Uniques 
VAMPIRE | 430 110 
VAMPIRE* | 238 0 
VAMPIRE! | 255 
VAMPIRE? | 322 2 


The calculi based on delayed unification perform badly in comparison to 
standard superposition. This is unsurprising, as syntactic first-order unification is 
already an efficient process. By replacing it with delayed unification, we gain little 
in terms of time, but pay a heavy penalty in terms of the number of inferences 
carried out. The use of fingerprint indices helps somewhat in mitigating this issue, 
but not a great deal. Eagerly solving trivial constraints shows more promise and 
is actually able to solve two problems that the standard calculus can not (within 
the time limit). These are the benchmarks CSR036+3.p and LAT347+3.p. 


8 Related Work 


The only other proof calculi that we are aware of that explicitly integrate unifica- 
tion rules at the calculus level, are the higher-order paramodulation calculi [8,22] 


t The code for both Vampire! and VAMPIRE? can be found at branch https://github. 
com/vprover/vampire/tree /delayed-unif-with-fp. VAMPIRE! was built from commit 
c04a08feb5db3e7468aifa and VAMPIRE! from commit fa2f139302b6a7a6487e73. 
Again, option -duc on is required for the new calculi to run. 
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and lazy paramodulation [21]. However, these calculi are paramodulation calculi 
and do not incorporate certain concepts of redundancy so crucial to the success 
of superposition provers. Moreover, the completeness proofs for these calculi are 
based on very different techniques to the Bachmair & Ganzinger style model build- 
ing proofs commonly employed in the completeness proofs of superposition calculi. 

There are other calculi that in some form do represent the folding of unifica- 
tion into the calculus, but the link between the unification rules and the calculus 
is less clear. For example, the recent work by one of the authors of this paper [13] 
relating to reasoning about linear arithmetic, moves theory reasoning relating 
to a number of equations from the unification algorithm to the calculus level. 
A different example, by another of this paper, is the combinatory-superposition 
calculus [9] which essentially folds higher-order combinatory unification into the 
calculus. In both cases, the relationship between the unification algorithm and 
the calculus rules is not obvious. 

There are other methods of dovetailing unification with inference rules. For 
example, a unification procedure can be modified to return a stream of results. 
This stream can be interrupted in order to carry out further inferences and then 
returned to later. This is the approach taken by the higher-order Zipperposition 
prover [7] in order to handle the infinite sets of unifiers returned by higher-order 
unification. Conceptually, this is a very different solution to using constraints, 
since the intermediate terms created during unification are not available to the 
entire calculus as they are in our approach. Furthermore, from an implementa- 
tion perspective, streams of unifiers are a far greater departure from the stan- 
dard saturation architecture than the adding of constraints. Unification can also 
be partially delayed by preprocessing techniques such as Brand’s modification 
method and its developments [5]. 

As mentioned in the introduction, abstraction resembles the basic strategy 
[4,15], where unification problems are added to the constraint part of a clause. 
Periodically, these constraints can be checked for satisfiability and clauses with 
unsatisfiable constraints removed. However, in the basic strategy, the constraints 
do not interact with the rest of the proof calculus. Moreover, redundancy of 
clauses can no longer be defined in terms of ground instances, but only in terms 
of ground instances that satisfy the constraints. This significantly affects the 
simplification machinery of superposition /resolution. 

Unification with abstraction was first introduced, to the best of our knowl- 
edge, by Reger et al. in [17] in the context of theory reasoning. However, the 
concept was introduced in an ad-hoc fashion with no theoretical analysis of 
its impact on the completeness of the underlying calculus. Recently, the rela- 
tionship between unification modulo an equational theory and unification with 
abstraction has been analysed [13] and a framework developed linking the two. 
It remains to explore whether the current work can fit into that framework. 


9 Conclusion 


We have developed a first-order superposition calculus that delays unification 
through the use of constraints, and proved its completeness. Whilst the calculus 
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does not perform well in practice, we feel that the calculus and its completeness 
proof form a template that can be followed to prove the completeness of calculi 
that involve unification procedures more complex than syntactic first-order unifi- 
cation. For example unification modulo a set of equations Æ. Some of the crucial 
features of our approach are: (1) the carrying out of partial unification and 
adding the remaining unification pairs back as constraints, and (2) the ignoring 
of constraint literals in the definition of redundant inference. In particular, fea- 
ture (1) may well be crucial in taming issues relating to undecidable unification 
problems. For example, in higher-order logic where unification is undecidable, it 
is common to run unification to a particular depth and then give up if termina- 
tion has not occurred. Of course, this harms completeness. With our approach it 
should be possible to add the remaining unification pairs back as constraints and 
maintain completeness. In the future, we would like to generalise our approach 
into a framework that can be used to prove the completeness of a variety of 
calculi as long as the unification problem for the underlying terms meets certain 
conditions. We would also like to explore instantiating such a framework to prove 
the completeness of particular calculi of interest to us such as AC-superposition 
and higher-order superposition. 


Acknowledgements. We acknowledge funding from the ERC Consolidator Grant 
ARTIST 101002685, the TU Wien Doctoral College SecInt, and the FWF SFB project 
SpyCoDe F8504. 
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Abstract. We introduce a calculus for incremental pre-processing for 
SMT and instantiate it in the context of z3. It identifies when powerful 
formula simplifications can be retained when adding new constraints. Use 
cases that could not be solved in incremental mode can now be solved 
incrementally thanks to the availability of pre-processing. Our approach 
admits a class of transformations that preserve satisfiability, but not 
equivalence. We establish a taxonomy of pre-processing techniques that 
distinguishes cases where new constraints are modified or constraints 
previously added have to be replayed. We then justify the soundness of 
the proposed incremental pre-processing calculus. 


1 Introduction 


Pre-processing is a central ingredient for scaling automated deduction. These 
techniques apply targeted global simplification steps that can drastically reduce 
the complexity of problems before search techniques that use mainly local infer- 
ence steps are invoked. They are used across several solver domains, spanning 
SAT, to SMT, first-order automated theorem proving, constraint programming, 
and integer programming. With the exception of SAT solvers, prior techniques 
do not combine well when new constraints are added incrementally to a pre- 
processed state. Solvers have the option to restart pre-processing from scratch. 
This model is viable if the overall number of solver calls is small compared to 
time spent solving, but is not practical for scenarios where many minor varia- 
tions of a set of main constraints are queried. Such scenarios may be found in 
applications of dynamic symbolic execution or symbolic model checking. 

A procedure to incorporate pre- and in-processing techniques [27] into incre- 
mental SAT solvers was introduced in [18], where such incremental in-processing 
allowed a dramatic improvement in the performance of bounded model checking 
applications. In the case of SAT, the effect of a simplification step is recorded 
in a reconstruction stack. Each eliminated clause is saved on that stack together 
with a partial assignment, called its witness, that is used to show the redun- 
dancy of the eliminated clause. For example, the redundancy of blocked clauses 
are witnessed by their blocked literal, a literal that upon all resolvents are tau- 
tological [26,32]. The reconstruction stack has two very important roles in SAT 
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solvers. First of all, it has all the information that is necessary for model recon- 
struction [25]. When the elimination of a clause is not model-preserving, its 
witness on the stack tells how to modify or extend any found solution of the 
simplified formula such that it then satisfies the removed clause as well. Beyond 
that, the reconstruction stack allows to recognize all those previous simplification 
steps that are potentially invalidated by an incrementally added new constraint. 
For example, literals that were blocked in the global state of the previous clauses 
might not be blocked any more in the presence of some new constraints. Finding 
these clauses and their cone of influence on the reconstruction stack allows to 
undo only the problematic previous simplification steps, thereby allows pre- and 
in-processing to be incremental [18]. 

Motivated by incremental in-processing SAT solvers, our goal here is to pave a 
path towards a similar mechanism in the context of SMT solvers. However, SMT 
problems extend propositional SAT formulas in several dimensions: the base the- 
ory of SMT is the theory of equality over uninterpreted functions and predicates, 
SMT formulas may contain quantifiers, and constants and functions that have 
interpretations over theories. Concrete cases of incremental SMT pre-processing 
was considered in [19]. While most of the formula simplification techniques of 
SAT solvers are captured by well studied redundancy properties [23], such a 
unified understanding and description of SMT pre-processing techniques is not 
yet introduced. Though some redundancy notions of SAT solvers can be directly 
embedded or generalized to SMT [80], a notion that appears to capture simplifi- 
cations in SMT in many cases is that of a substitution: an uninterpreted constant 
or function is defined into a solved form and the constraints are simplified based 
on the solution. When new constraints, containing the solved function symbols, 
are added after pre-processing, our method distinguishes between simplifications 
that allow applying the substitution to the new formula or removing the substi- 
tution and re-adding the old constraints that were simplified. We have found it 
useful to characterize pre-processing simplifications by the following categories. 


Equivalence Preserving Simplifications. Many simplification methods are based 
on equivalence preserving simplifications. For example x > x— y+ 1 simplifies to 
y > 1. They are automatically incremental by virtue of not changing the set of 
models. Developing equivalence preserving simplifications is a significant area of 
research and engineering by itself. A good example is using and-inverter graphs 
(AIGs) for simplifying propositional and first-order formulas [24,45]. The main 
challenge with developing equivalence preserving simplifications in an incremen- 
tal setting is to make them efficient. 


Rigid Constrained Simplifications. An important class of simplifications are 
based on eliminating variables by finding solutions to them. In the formula 
x<ytlAuz>ytl1A ¢yf[z,y] we can solve for x (or y) by setting z ~ y +1 
and then substituting in the solution for x into y. The simplified formula is 
ply +1, y]. The set of models of the original formula must all satisfy the equality 
x ~y+1. This property allows to reuse the simplification when later adding 
a formula 7[z,y]. It can be added by applying the solution for x: ¢#[y + 1, y}. 
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A model of yly + 1,y] A v[y + 1,y] must conversely correspond to a model of 
the original formulas y|[x,y] and y[x,y]. The equality x + y+ 1 is used ina 
model converter to establish the original model. Some pre-processing techniques 
translate constraints from one domain to another. For example, formulas over 
bounded integers can be solved by translation into bit-vectors. This translation 
can be described with a set of equalities where bounded integers are solved for 
their bit-vector representation (see later an example in Table 1). 


Under Constrained Simplifications. The rigid constrained simplifications already 
cover a significant class of pre-processing methods. Allowing incrementally solv- 
ing for variables has a profound practical effect on using z3 incrementally in 
user scenarios. There is however a larger class of simplifications that also allow 
eliminating variables but do not preserve solutions to the eliminated variable. 
These simplifications have the same or more solutions for symbols in the orig- 
inal formula and we call them under-constrained. For example, the formula 
(a ~yAy <z+u)Vy > z-u) contains x in only one position. It can be 
replaced by the formula ((bA y < z+u)Vy È z-u) where b is fresh. Similarly 
introducing definitions of fresh symbols does not eliminate solutions to sym- 
bols in the original formula. Lastly, when removing redundant clauses, the new 
formula may have more solutions. Tseitin transformation introduces definitions 
that allow removing redundant, non-CNF, formulas. 


Over Constrained Simplifications. Symmetry reduction [14,38] and strengthen- 
ing using propagation redundancy criteria [37] are prominent examples of sim- 
plifications that apply strengthening to reduce the search space. These transfor- 
mations are not covered by the classes covered by our main result. We leave it to 
future work to examine whether or how to incorporate strengthening: one avenue 
is to leverage assumption literals [16] to temporarily enable strengthenings either 
as part of pre-processing or during search [39]. 

Table 1 summarizes the main categories of pre-processing techniques dis- 
cussed so far. This paper develops a calculus of incremental pre-processing for 
rigid constrained, under-constrained, clause elimination, and introduction of def- 
initions. However, it does not discuss further over-constrained simplifications. 

In this paper we introduce the concept of simplification modulo substitu- 
tions and show that the main SMT pre-processing methods maintain such a 
property. Based on that, we show how to apply or revert the effect of previous 
pre-processing steps when new formulas are added after simplification. 


2 Preliminaries 


We assume the usual notions of first-order logic with equality, satisfiability, log- 
ical consequence and theory, as described e.g. in [17]. An interpretation M for a 
signature X (or X-model) consists of a non-empty set Um called the universe of 
the model, and a mapping (_)™ assigning to each variable and constant symbol 
an element of Um, to each n-ary function symbol f in X an n-ary function fM 
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Table 1. Main categories of pre-processing techniques found in SMT solvers. Function 
ite is an abbreviation for if-then-else and bv2int is a function that maps a bit-vector 
to an integer value. 


category example input example output model converter 
equivalence |x >ax—-—y+1 y>l E 
rigid r~t, p y[t/x] art 
O0<a<1lA(e@vr1lVy>0) wmVy>O0 xt ite(xp, 1,0) 
1l<a<4A(@21Vvy>0)/ bp x0vVy>0 xr 1+ bu2int(bj2)) 
under F, ((x~xt^) vy) F, (p|t/x] V 4) xet 
x g FV(4), FV(F) 
Fx <yxz<lzy<u F x > min(y, z), y> u 
x,y € FV(F) 
def-intro (anb)Vce nx, V a, arp V b, £p Vic E 
redundant | F, =p V ~q, p Vq F, =p V ~q pre pV 7q 
p is positive in F 
over p(x), p(y), p(z) asysz E 
p(x), p(y), p(z) 
from Uk, to Um, and to each n-ary predicate symbol p in X an n-ary function 


from the set UX, to distinguished values representing true and false. Note that to 
keep the presentation simple, we only consider a single universe in the models. 
Interpretations extend to terms by composition. 

We use the terminology symbols referring to uninterpreted symbols (vari- 
ables) and function symbols. Given a model M and a symbol x, the model 
M([x + a] is exactly the same as M, except that 2” = a where a € Um for 0- 
ary symbols and a is a function over Um for n-ary function or predicate symbols. 


Lemma 1 (Translation Lemma [41]). Jf F is a formula and t is a term 
s.t. no variable in t occurs bound in F, then M = FẸt/x] iff M[x = t] E F. 


Note that we may use À terms to represent updates to function and predicate 
symbols. The interpretation of a À term is a function. 

We denote Skolem symbols for n-ary functions (where n = 0 is possible) that 
cannot occur in input formulas. Only pre-processing methods may introduce the 
Skolem symbols as a guarantee that they are fresh. 


Convention 1 (Variable non-capture). Throughout this paper we assume 
that free and bound variables are disjoint, such that when we substitute a term t 
for a variable x in formula F, none of the variables in t are captured. 


Definition 1 (Labeled substitution). (zt; Y)’ represents a substitution of 
x by t, justified by the formula W. The label B is either T or L and it indicates 
whether the map «+> t may be used as an equal replacement of Y. 


Example 1. The labeled substitution (x — y+ 1;2 ~ y+ 1)+ represents the 
substitution of x by y+ 1 justified by the formula x ~ y+ 1. The label L of 
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the substitution indicates that applying the substitution on a formula F where 
x ~ y + 1 is present does not change the set of models of the formula. 


Definition 2. Given 0 = (xı — t1; Y1 )P! (x — t2; Wo) ®? ... (En — tn; UnB” and 
an interpretation M, we define the interpretation MO as follows: 
Me=M 
M(x =t; Y) = (Mx t™))0 
Definition 3. Given 0 = (xı —t1;W%)® (x — t2; Wo) ®? ... (an — tn; Wn)® and 
a formula F, we define the formula F@ as follows: 
Fe = F 
Flat; ¥)"6 = (F|t/x])0 

Informally, a sequence of substitutions 0 is applied to interpretations from 

right to left (i.e. backwards), while to formulas from left to right (i.e. forward). 


Further, note that the translation lemma generalizes in a straight-forward way 
to substitutions. 


3 Incremental Pre-processing 


In this section we introduce a calculus to describe incremental pre-processing for 
SMT based on the following notion. 


Definition 4 (Simplification modulo 0). We say that the formula F sim- 
plifies to F’ modulo 0, denoted F = F” if 


- IfM EF then there is a model M’ such that, M’ = F’ and M’ agrees with 
M on all symbols that are in F or in background theories or not in F". 
- If M' = F' then M'0E F. 


It follows that simplification allows transitive chaining assuming that symbols 
are not recycled. 


Lemma 2 (Transitivity of simplification). Let F =» F’ and F’ > F" 
such that every symbol that is both in F and F” also occurs in F” (i.e. old 
symbols are not re-introduced). Then F =o F". 


3.1 Simplification Rules 


There are several possible situations where the concept of simplification modulo 
substitutions can be used to capture potential simplification steps. For example, 
a useful special case for simplification modulo 0 is when a formula F implies an 
equality x ~ t that can then be turned into a substitution to simplify F. 


46 N. Bjgrner and K. Fazekas 


Example 2. The formula isCons(x) A F[a] implies Sh,t . x ~ cons(h,t), where 
h,t are fresh variables (corresponding to the head and tail of a cons list). We may 
substitute x by cons(h, t) in F[a] to eliminate x. The literal isCons(cons(h,t)) is 
equivalent true and F'[cons(h, t)] is a model simplification of the original formula 
modulo x ~ cons(h, t). 


There are also useful special cases where a formula F does not imply an 
equality x ~ t, but the same equality may still be used to simplify F. 


Example 3. In the formula F := ((x ~ 3Aa >u)Vy > u)^u > z we can 
substitute z +> 3 and retain simplification. The formula F simplifies to F[3/a] := 
(3>uVy>u)Au> z, but F does not imply x = 3. 


There are also cases where substitutions are not suitable to describe the 
relation between F and F”. It is easier to characterize these by the property that 
F” is a proper subset of F. 


Example 4. A blocked clause pVC can be removed from a set of formulas without 
changing satisfiability: F, (pVC) =p4opvc F. If we were to substitute p by pynC 
everywhere in F it would weaken clauses where p occurs positively. 


Finally, it is possible to accomodate cases where pre-processing introduces 
definitions, such as through the unfold transformation (see Sect.6.5), or by 
Skolemization and Tseitin transformations. 


Example 5. The Skolemization of Va . dy . p(x, y) is Vx . p(x, fsk(x)). Here the 
original quantified formula is replaced by the Skolemized formula. 


We model the pre-processing performed by an SMT solver as a sequence of 
abstract states where each state consists of two components: a formula F and 
an ordered sequence of labeled substitutions 0. Based on the shown cases, we 
formulate the following conditions for applying simplification rules in Fig. 1. 


RIGID : 

F6 => Flt/a] | O(@ct;w)+ if WC Ra Zt, and Y > Fy. 2 ~ tly] 
FLEX : 

FW || 0 => F,W{t/z] || ect)" ifeeW,r¢ Fand Y Ha: Y[t/zx] 
UPDATE : 

Fv || 0 = F® | olre} t; wv)" if FY Kost FP 


Fig. 1. A calculus for pre-processing in SMT 


We formulated the side conditions that allow to identify a minimal set of 
conjuncts W of F involved with the solution for x. Note that a simplification 
remains valid when adding conjuncts that do not contain x. The UPDATE rule 
handles broadly a set of simplifications, including proof rules from DRAT sys- 
tems and introduction of definitions and Skolemization. It may be presented in 
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forms where ® or W or the substitution are empty. The substitution x ++ t gener- 
ally represents a tuple of symbols x replaced by terms t. To simplify presentation 
we only discuss the case where x is a single symbol and we elide rules that pre- 
serve equivalence. The UPDATE rule records W so it can later be re-added in case 
a new constraint mentions x. This may be overkill when &[t/y] = W for y fresh 
(in Sect. 4 we will show another rule, INVERT, that adds only the equality y œ t 
in such cases). 


Lemma 3. If F > Jy . x œ t[y], s.t. y Z F, «Zt, and t is substitutable for x 
in F, then F xaos F|t[y]/z]. 


Proof. Let M be an interpretation s.t. M = F. Then MEF Ady. xa tly 
and by definition of the satisfaction relation, there must exists an a € Um, 
s.t. M[y = a] H F Az x tly]. Let M’ note M[y = a]. From M’ = F ^z ~ tly 
follows that « = t[y™M and so FM’ = FẸt[y]/z]M". Since M’ = F, we have 
that M’ — Fit[y|/z]. For the other direction, when M’ = FẸ[t[ly]/x], due to 
Lemma 1, M'[x > tly M] E F. Hence, F >rt FẸtly]/z]. 


Corollary 1. The side-condition for RIGID implies that F =q F|t/x]. 
Lemma 4. Assume Y C Fix g F\W and Y ~y414 W[t/x], then F =r F[t/x]. 


Proof. Since x ¢ F, (F \ Y) = (F \ W)[t/a], thus (F \ Y) =z (F \ Y)[t/a]. 
Then, from Y >t Y[t/x] follows that F =+ F[t/zx]. 


Lemma 3 established that the side-condition for RIGID ensures simplification 
modulo 0. We therefore have the following corollaries. 


Corollary 2. If a formula F" is derived from F by the inferences from Fig. 1, 
then it has the property F Zym F”. 


The other rules enforce preservation of satisfiability in their side-conditions. 
Corollary 3. The rules from Fig. 1 preserve satisfiability. 


The transitive application of the simplifications also preserve satisfiability in 
a way that extends the notion of simplification modulo a substitution. 


Proposition 1. Consider a formula Fo and a state F || 6 derived from Fo || € 
using the rules from Fig. 1. Then Fo =o F. 


Proof. It follows as Corollary 2 notes that each application of a rule from Fig. 1 
is a simplification modulo and Lemma 2 notes that simplification modulo is 
transitive. 


Informally, Proposition 1 means that using 0, one can transform any model 
of the simplified formula into a model of the original input formula. Note that 
the simplified F may contain fresh Skolem symbols that are not occurring in Fo. 
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3.2  Pre-processing Replay 


Rules of Fig. 1 captured possible pre-processing steps that can be applied on 
a single SMT problem. We now describe the scenario where we add additional 
constraints ® to a pre-processed state. Without incremental pre-processing we 
have the option to conjoin ® to the original formula Fo and re-run pre-processing. 
The goal of incremental pre-processing is to retain as much of the effect of 
previous work as possible. 

We will show that for pre-processing steps derived by rule RIGID it is possible 
to apply the corresponding substitution to ® directly, while the other simplifica- 
tion steps may require to re-introduce formulas that were previously removed. 
We call this process of applying the effect of simplifications on a new formula 
as pre-processing replay. Figure 2 shows an imperative implementation of pre- 
processing replay. 


Replay (formula ®, substitution sequence 0 = 01,...0n) 
1 0 := () 

2 for (xi 4 ti; Wi)” from cı to on 

3 if x; € FV(®) then 

4 if B; = T then // substitution is not RIGID 
5 $ := UW, // re-introduce 

6 else 

7 ® := Pti/xi| // apply 

8 0 = O (xiti; GP 

9 else 

10 6’ = O! (a, 5; Wi) 

11 return (@, 6’) 


Fig. 2. Algorithm Replay 


Our main proposition summarizes the main property of Replay and ensures 
that an arbitrary formula @ can be added mid-stream after pre-processing. 


Proposition 2. Let F || 0 be a state resulting from pre-processing Fo, and let 
FAP || 0’ be a state produced by applying procedure Replay to ® and 0, then 
Fo A® is equi-satisfiable to F AP. 


To establish Proposition 2 we will introduce a calculus for reverting the 
effect of simplifications. It is shown in Fig. 3 and comprises of two rules, one for 
adding a formula with a substitution to F, the other both reverts the effect of 
a simplification and adds the reverted formula to F. The inferences rely on a 
side-condition that the formulas @,¥ are clean relative to the substitution 8. 


Definition 5. A formula ® is clean w.r.t. a substitution sequence 0 iff 
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ADD: 

F || 0 = F,60 || 0 if Gis clean w.r.t. 0 
UNDO: 

F || do(act; Y) — F,W6 || 600 if W is clean w.r.t. 0 


Fig. 3. A calculus for reverting pre-processing. UNDO reverts a simplification by re- 
introducing a constraint. It prunes 0 until ADD applies for a new constraint ®. 


~ 0=€, or 
xt; W)®6', x g P and @ is clean with respect to 6’, or 


( 
(x—t;W)+0' and G{t/zx] is clean with respect to 6’. 


Thus, intuitively, ® is clean w.r.t. 0 if 69 uses only RIGID substitutions from 0. 
We now establish that formulas that are clean relative to 6 can be added 


(after substitution) to formulas while maintaining models. The substitution used 
in rigid updates corresponds to equalities that are consequences. 


Lemma 5. Given a state F” || 06’ derived from the state F || 0 and formula ® 
that is clean with respect to 0’, then FA ® >g F' A BO. 


Proof. We examine the two directions. 


— Let M | FA @. Induction on the length of the derivation from F to F” 
establishes that if M |= F, then there is a corresponding M’ such that 
MHF Nacyeo x ~ t: Each time RIGID is applied a new equality is 
used for simplification F;ftı/zı]. The equality can be added to the result, 
Fi[tı/z1] A zı ~ tı without changing satisfiability because x; does not occur 
in Fıftı/xı]. Thus, the resulting model M’ can be constrained to satisfy all 
equalities used in rigid substitutions. Since M’ = @ already, then M’ = #0’. 

— Let M’ — F’ A £6’. Then from the assumption of simplification modulo 6’, 
we get M'O’ | F. Lemma 1 ensures M'O’ = &. Thus, M'O’ E F A8. 


The correctness of the ADD rule is now immediate: 


Corollary 4. Let F || 0 be derived from Fo || €, and © clean with respect to 0, 
then Fo \® simplifies modulo 0 to F A D0. 


Proof. It follows from Lemma 5. 


With Proposition 1 we established that RIGID, FLEX and UPDATE maintain 
Fo =o F. We need to show that also for rule UNDO. The first step is to establish 
that the formula removed by each of the pre-processing rules can be re-added 
without affecting simplification. 


Lemma 6. Given an inference F || 0 => F' || 0(x—t;W)® by either of the 
rules RIGID, UPDATE, FLEX the formula F simplifies to F’,W modulo e. 


Proof. The proof is by case analysis by the rule that is applied. 
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— FLEX: Then F” = F{t/x], V C F and therefore F’ ^A Y = F A^ W{t/a]. From 
the side condition V =y Y[t/x] every model of F there is a model of U[t/x] 
that agrees with symbols from F. Conversely F’,W properly contains F and 
therefore implies it. Therefore, F >. F’,W. 

— UPDATE: We want to show that F,W simplifies to F,W,® modulo £. The 
premise of UPDATE ensures that for every M — F, Y there is a model agreeing 
with M on symbols in F,¥, that satisfies F,®. Since interpretation of the 
symbols in W is unchanged it also satisfies W. Conversely, if M’ = F\W,4, 
then already M’ — F,W and therefore M'e = F,W. 

— RIGID: We wish to establish that F >. F’,W. First observe that F’,W = 
F,W|t/x]. Since W implies the equation Jy . x ~ t, every model of F implies 
there is a solution to y such that W[t/a] that agrees with the variables in F. 
Conversely, if F, W[t/a] is satisfied by M’, then M’ already satisfies F. 


Lemma 7. Given F || O(x— t; P)? =V F, WO || 00’, s.t. Fo 6 (a4,0) 26" 
F, then Fo =o9 F, YO holds. 


Proof. Given an inference F} || 0 => F; || 0(z—t;W)®. Lemma 6 establishes 
that the formula F; simplifies to F2, ¥ modulo e. Lemma 5 establishes that Fh, Y 
simplifies to F,W6’ modulo 6’. Chaining the definition of simplification modulo 
transitively establishes the lemma. 


With Corollary 4 and Lemma 7 we have then established Proposition 2. 

It is worth examining why the side-conditions for simplification modulo are 
used. As the following example shows, transformations that only preserve satis- 
fiability but strengthen formulas cannot be used easily in an incremental setting. 


Example 6. Let Fo be the satisfiable formula z ~ yA y < zAz~ v. In that 
formula x,y are equal, and z,v are equal. Lets assume that we simplify via 
the solution where the classes are merged (i.e. where y ~ z). It is satisfiability 
preserving. It suggests a transformation that we call FLEX’. 


er~yANy<zAzru le 


F T 
grzhzrv || (yo2u(eryAy<z))' a 


The resulting state is still satisfiable. Now UNDO can be applied without any 
problems. The result is still satisfiable, but not equivalent to Fp (does not have 
the models where the two equivalence classes are not merged). 


gzzNzru || (youa(e@zyAy<z))! 


UNDO 
(ax~yAys<z)AurzAzrule 


Adding the constraint y œ~ z — 1 to Fo would be satisfiable, but adding it to our 
formula is unsatisfiable. 
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4 Simplification Methods 


Many simplification methods used in practice during pre-processing are equiv- 
alence preserving. These methods include formula rewriting, constant propa- 
gation, NNF conversion, quantifier elimination, and bit-blasting. They do not 
require the methodology from this paper and have been integral in Z3 since its 
inception. We will here discuss main simplification pre-processing routines that 
do not preserve equivalence and how they relate to our taxonomy. 


4.1 Equality Solving 


One of the most useful pre-processing techniques eliminates symbols when they 
can be solved, that is, a constraint implies an equality x ~ t, where t is a 
term that does not contain x. Equality solving corresponds to finding unitary 
solutions to unification problems modulo theories. Most uses of equality solving 
are captured by transformations justified by rule RIGID. In Z3, equality solving 
comprises of a two stage process: 


1. Extract a set of solution candidates € implied by the current formula y. 
2. Extract from € a subset of solutions that can be oriented without introducing 
cyclic dependencies. 


To elaborate, let E be a set of solution candidates zı = t1,...%,) = tn. The 
candidates may contain multiple equalities using the same symbol. For example, 
E could be « = f(zx),x = g(y),y = h(z). We can’t use the solution z = f(x) 
because x already occurs in f(a). But we can use the solution x = g(y), y = h(z) 
processed in this order as first x is replaced by g(y), then y is replaced by h(z). In 
the second stage we extract from E a subset of equalities xi = ti,,...,%, = tips 
where x;, are distinct and t;, are terms such that 2;, ¢ ti; for j < 7’. The subset 
is in triangular form. 


Example 7. We illustrate two application of RIGID for eliminating two symbols 
from three equations. The choice of the first two equations is arbitrary. An 
alternative simplification could choose to eliminate x and z instead. It is not 
possible, however, to eliminate all three variables. 


F,r ~y+1l,yœz+1,z œ f(x) || oe 
Fly +1/z],y ~z+1,z ~ fly +1) || Qa@—yt 1;x ~ y+ 1) — Ricip 
Fly+1/z,z+1/y], z = f(2+2) || Penge hesy sy et Lye) 


The set of unification modulo theories facilities used in Z3 is based on extract- 
ing simple definitions. Foremost, for a conjunct x œ t of p, where x is uninter- 
preted, x Æ t, include the equality candidate x ~ t. Other equality candidates are 
included from formulas of the form ite(c,x ~ t,x ~ s) and arithmetic equalities 
of the form «+s œ t, such that x ~ t—s is a solution candidate for x. Note that 
solution candidates are not necessarily unique for an equality. The constraint 
x+y t can be used as solution to both x and y. If x has a nested occurrence 
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within t, the solution for y, but not x, can be used. Equality solving interacts 
with simplification pre-processing: equalities over algebraic data-types can be 
assumed to be in decomposed form already since rewriting simplification decom- 
poses equalities of the form cons(h1,t1) ~ cons(hg,t2) into hy ~ hg Ati > te. 
Equality solving can be extended modulo theories in several directions. Arith- 
metical equalities can be extracted from Diophantine equations solving and poly- 
nomial equality factorization as part of establishing a Grobner basis. Equalities 
can be extracted from inequalities [6,31], other theories, such as the theory of 
arrays allow extracting solutions from equalities store(a,i,v) œ t, where a is a 
symbol that does not occur in t,i,v, as a ~ store(t, i, w), together with the con- 
straint select(t,i) ~ v, where w is fresh. We leave a study of the cost /benefits 
of these approaches within the context of incremental pre-processing to future 
work. 

Equality solving is extended to sub-formulas in the following way: When a 
positive sub-formula implies an equality x ~ t and the symbol x does not occur 
outside of the sub-formula then x can be replaced by t within the subformula. 
The solution is no longer rigid constrained but can be justified by FLEX. 


Example 8. Suppose x ¢ FW, then we can use FLEX to justify the simplification 


F, (x ~tA@[a]) v Y || 0 = FOE VY || Ort; (x >t A GP[z]) VV)" 


4.2 Unconstrained Sub-terms 


Symbols that have a single occurrence in a formula may be solved for based on 
context. For example, with the formula x < y,y < z,z < u, p(u),q(u), the con- 
stant x can be eliminated by using the solution x ~ y. Then y can be eliminated 
by setting y ~ z — 1, and finally z ~ u. 

Invertibility of unconstrained symbols (see e.g. [7,8]) in an incremental set- 
ting for bit-vectors was introduced in [19]. The method implements the following 
proof-rule, exemplified for the term x + t, containing the only occurrence of x. 


INVERT : 
Flx +t] || 0 = Fly] || 0lz—y-—t;y~xr+t)! if x occurs uniquely in F 
y is fresh 


To justify rule INVERT in our setting, it suffices to check the condition from 
Lemma 6. Alternatively, we can use the generic rule UPDATE when applying 
unconstrained simplifications. The rule INVERT is more efficient than using 
UPDATE because the latter requires adding back an entire conjunction YW where 
the invertible term x +t occurs. Invertibility can also be used to justify elimina- 
tion of nested definitions. For a definition F'A((a ~ tA®[x]) VW) (see Example 8), 
where z ¢ F,W can first be rewritten as F A ((a œ tA @[t]) VW). Then z œ t 
is invertible because it contains the only occurrence of x. The new constraint is 
F A ((yA @t]) VW) where y is a fresh Boolean symbol. 
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Invertibility conditions are theory dependent. Figure4 exemplifies main 
invertibility conditions for arithmetic’. 


|| 0 —s INVERT F y T 


Flt — zx] [ zet—y;yrt—z) 
Fla ` x'] Il 0 =y INVERT Fly 
F| | 
F| | 


RE 
] || 0(z,z'4}y,1;y x£- x)! 
J I o = Fiy] || a 
J I 8 


xi x+ ite(y,t,t+1);y xz <t) 
< z] Il 0 — INVERT Fly 


xeite(y,t,t-l)jjyxrt<a) 


Fig. 4. Invertibility rules for symbols x, x’ that occur uniquely in F; y is fresh. 


Z3 uses a heap ordered by occurrence counts to identify candidates for invert- 
ibility. It first processes all symbols with occurrence count 1. If it is possible to 
eliminate a symbol with occurrence count 1, the occurrence counts of sub-terms 
under the term that gets eliminated are decreased. The elimination process stops 
once the heap only contains symbols with occurrence counts above 1. 


4.3 Symbol Elimination and Macros 


SAT solvers use symbol elimination [15] to simplify clauses. The first-order ver- 
sion [11] remains timely in more recent works as well [28]. A predicate p can be 
eliminated if it occurs at most once in every clause either positively or negatively. 
Clauses that contain p are replaced by resolvents by applying binary resolution 
exhaustively, and then remove clauses containing p. 


Example 9. We illustrate symbol elimination for the ground case with two 
clauses, and F such that p ¢ F, as an instance of the UPDATE rule. 


F, p(t) V ®, ap(s) VO || 0 => UPATE 
F,s#tVOvVW || 0(p—Arz . p(x) V (x ~ tA 7G); p(t) V &, ap(s) V Y)! 


The same elimination technique can also be applied to Horn clauses where 
p does not occur both in the head and body of any rule. A solution for the 
eliminated predicate is a conjunction of the upper bounds for p or a disjunction 
of lower bounds for p. It is generally a quantified formula. If the involved clauses 
admit quantifier free interpolants, the solution can also be computed using an 
interpolant from a solution to the reduced system [4]. Thus, the term t in a 
substitution «+> t may only be computed after an initial model is known. 

There are many cases where symbols can be eliminated incrementally and 
justified by the RIGID rule: 


— Macros Va . f(x) ~ tla], Va . f(x) +5 ~ t are handled as Va . f(x) ~ 
t — s, assuming f is not free in s,t. Then replace occurrences f(a) by tla], 
respectively t[a] — s{a]. 


1 A summary of rules used for other theories can be found online: https://microsoft. 
github.io/z3guide/docs/strategies/summary#tactic-elim-uncnstr. 
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— Quasi macros Yx, y . f(x,y, x +y) ~ t|zx, y], then replace f(a, b,c) by ite(c ~ 
a+ b, ta, b], f’(a,b,c)), assuming f ¢ t. 

— Conditional macros Va . f(x) ~ t|z] v Cla], then replace f(a) by 
ite(C|a], f'(a), tla]), where f € t,C. 

— (f(x) = t) = y, where f ¢ t,ọy. Then replace f(a) by ite(¢,t, f'(a)) and add 
the clause Vx . f'(x) % t. 


Macro elimination can be extended to ordered structures and in combination 
of theories [42]. It has been integral to making quantified reasoning with bit- 
vectors [44] practical. We claim that first-order in-processing rules based on 
blocked clauses, asymmetric tautology elimination, covered clauses known from 
SAT [29] can also be captured by UPDATE. We substantiate the claim with an 
example, but leave a comprehensive treatment for future work: 


Example 10. Consider the clause C := p(x) V q(x) and F := 7p(x) V p(f(x)) V 
r(x), =p(x) V p(f(x)) V p(g(x)). The variable x is universally quantified. Then 
C can be rewritten to p(x) V q(x) V p(f(x)) without affecting satisfiability. The 
covered literal p(f(x)) was added to C as it occurs in every resolvent with p(x). 
The model for p has to be fixed, however. The model update is a first-order 
lifting of the propositional case. 


F, p(x) V q(x) || 0 =T 
F,p(a) V q(x) V p(f(£)) || 0p Aa . p(x) V p(F(£)); Var . p(x) V g(a) 7 
To illustrate unification constraints in model updates, consider the clause C := 
p(h(x)) V g(x) and p' := Aw . p(x) V 3y . x ~ h(y) A>a(y): 
F, p(h(x)) V q(x) || 0 = Trae 
F, p(h(x)) V a(x) V v(f(A(a))) || Opp’; Yx . p(h(x)) V g(a)" 


5 Implementation 


We have implemented incremental pre-processing as an integral component 
of a new SMT solver, part of Z3. It can be enabled by setting the option 
sat.smt=true from the command line. It includes simplification by equality 
solving, elimination of uninterpreted sub-terms and macro detection as described 
in Sect. 4. The primary reason for supporting incremental pre-processing has 
been usability. GitHub issues pointing to performance cliffs when switching to 
incremental mode are recurrent. A distilled example where pre-processing can 
solve formulas is as follows: 


Example 11. Consider the benchmark. 


(set-option :unsat_core true) (set-option :sat.smt true) 
(declare-const exp Int) (push) 

(assert (! (= exp 1) :named assumption) ) 

(assert (not (= 2 (* 2 exp)))) (check-sat) (get-unsat-core) 


? See https://microsoft.github.io/z3guide/docs/strategies/simplifiers for a summary 
of simplifiers. 
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The legacy solver of z3 cannot solve it because it only knows about constant fold- 
ing when expanding the definition of exponentiation (the symbol ^). With incre- 
mental propagation, the equality (not (= 2 (^ 2 exp))) simplifies to false. 


Simplifiers interoperate with user scopes: SMT solvers support scoping using 
operations push and pop. All assertions made within a push are invalidated 
by a matching pop. To allow simplifiers to inter-operate with recursive function 
definitions they track symbols used in the bodies of recursive functions as frozen. 
Those symbols are excluded from solving. Similar to CaDiCaL’s implementation 
for replaying clauses (see [18]), our implementation of Replay stores the domain 
of 0 in a hash-set to bypass processing formulas that have no symbols in 0. 


6 Related Work 


6.1 Pre- and In-processing for SAT and QBF 


Pre-processing for SAT has received significant attention with the milestone 
work in Satellite [15] and then using notions of blocked clauses [27] and solution 
reconstruction [25]. Pre-processing techniques for QBF are discussed for example 
in [3,22]. The main pre-processing methods for propositional satisfiability solvers 
can be captured using our rule UPDATE (see Example 4 for an instance of blocked 
clause elimination simplification). For the case where ~p V D is a blocked clause, 
the model update is the de-Morgan dual: removing =p V D triggers the update 
M([p = (pA D)™). 

The work [18] introduces an inference system that also addresses redundant 
clauses and represents model updates using a notion of witness labeled clauses. 
The semantic content of the rules used for SAT are captured by UPDATE. How- 
ever, we elided tracking redundant clauses in this work. The case for SMT moti- 
vate specialized rules RIGID, FLEX and INVERT. 


6.2 Pre-processing for SMT 


Pre-processing simplification is integral in all main SMT solvers, including [2,33]. 
Incremental pre-processing with special attention to bit-vectors was introduced 
in [19]. Transformations considered in this thesis can be represented by the RIGID 
and INVERT rules. Z3 exposes pre-processing simplifications as tactics [13] and 
allows users to compose them to suit specific needs of applications. 

Invertibility conditions are used in [34] to guide local search. This work con- 
siders also a candidate value of all symbols. For example, F'[x - t] is invertible to 
Fly] if t evaluates to 1. 


6.3 Pre-processing for MIP 


Pre-solving is terminology for pre-processing for mixed-integer linear program- 
ming solvers. There is a significant repertoire of pre-solving methods integrated 
in leading MIP solvers. Their effects are well documented in the newer survey 
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[1], which provides an updated perspective to [20]. Pre-solving was developed 
earlier in [40]. The main methods can be categorized as operating on single rows 
(single constraints) or single columns (single variables), multiple rows, and mul- 
tiple columns, and use global information about the tableau. They include also 
methods known from other domains, such as literal probing also found in SAT 
solvers, and symmetry reduction for sparse systems [38]. We are not aware of 
under-constrained simplifications used in mainstream MIP solvers. Only symme- 
try reduction stands out as outside the scope of incremental pre-solve methods. 


Example 12. Pre-processing that combines two rows or combines two columns 
relies on efficient indexing [21] to be effective. The two column non-zero can- 
cellation method considers the situation where the coefficients to two variables 
maintain a high degree of correlation. Consider the following formula 


Qa+4y4+2<5 A x+2ytu<s6 A 3r+y+2z<3 A y where z,y ¢ vy. 


The coefficients to x,y in the first two inequalities are related by the affine 
relation given by A = 2. In this case the system can be reformulated, justified 
by rule RIGID, by introducing a fresh variable v and using the inequalities 


2vt+2<55 A vtu<6 A 38u—-5y+z2<3 A Q. 


6.4 Pre-processing in First- and Higher-Order Provers 


Pre-processing is also an important part of first-order theorem provers. Tech- 
niques for creating small clausal normal forms have long attracted attention [35]. 
Main simplifications [24] are based on detecting definitions similar to what is 
described in Sect. 4.3, but with the extra twist of ensuring that simplifications 
preserve first-order decidability, such as ensuring that formulas remain within the 
EPR fragment. Furthermore a variant of AIGs with nodes representing quanti- 
fiers are used to detect shared structure. While [24] is only concerned establishing 
preservation of satisfiability we note that the classification as model equivalent 
from Sect.4.3 extends to the cases considered. In-processing inspired by SAT 
was pursued for first-order [29,43] and recently for higher-order settings [5]. 


6.5 Constrained Horn Clauses 


Constrained Horn Clauses [4], enjoy a tight connection with Logic Program- 
ming where several transformation techniques were developed [10,12], including 
incremental consequence propagation [36]. Fold [9] transformations introduce 
auxiliary predicates and rules that correspond to replacing a code-block with 
an auxiliary procedure. It is justified by RIGID. Unfold transformations can be 
justified by UPDATE and correspond to macro elimination. 
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7 Summary 


We introduced a calculus of pre-processing for SMT. It distinguishes simplifi- 
cations that are rigid and so can be applied to new formulas as substitutions. 
Other simplified formulas may need to be re-introduced similar to re-introducing 
removed clauses in SAT. We examine several of the pre-processing methods stud- 
ied in SAT, ATP, MIP and SMT as instances of the calculus. We leave empirical 
and algorithmic studies of new pre- and in-processing methods to future work. 
Another angle we have left on the table is reconciling pre-processing with in- 
processing. For SAT, it was useful to develop a calculus that accounted for both 
irredundant and redundant clauses. In our current effort we have set this angle 
aside in favour of establishing main properties on replaying substitutions. 


Acknowledgment. Thanks to the reviewers for their extensive constructive feed- 
back and to Diego Olivier Fernandez Pons for introducing us to MIP pre-solving. The 
research was partially funded by the Austrian Science Fund (FWF) under project 
No. T-1306. 
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Abstract. Resolution and superposition provers rely on the given clause 
procedure to saturate clause sets. Using Isabelle/HOL, we formally ver- 
ify four variants of the procedure: the well-known Otter and DISCOUNT 
loops as well as the newer iProver and Zipperposition loops. For each of 
the variants, we show that the procedure guarantees saturation, given a 
fair data structure to store the formulas that wait to be selected. Our for- 
malization of the Zipperposition loop clarifies some fine points previously 
misunderstood in the literature. 


Keywords: Saturation provers - Proof assistants - Stepwise refinement 


1 Introduction 


Resolution [13] and superposition [2] provers are based on saturation. In a first 
approximation, these provers perform all possible inferences between the avail- 
able clauses. The full truth, however, is more complex: Provers may delete clauses 
that are considered redundant; for example, with resolution, if p(x) is in the 
clause set, then both p(a) and p(x) V q(x) are redundant and could be deleted. 

The procedure that saturates a set of clauses—or more generally, formulas— 
up to redundancy is called the given clause procedure [10, Sect. 2.3]. It has several 
variants. The two main variants are the Otter loop [10] and the DISCOUNT loop 
[1]. In this paper, we also consider the iProver [8] and Zipperposition loops [17]; 
they are variants of the Otter and DISCOUNT loops, respectively. 

In its simplest form, the procedure distinguishes between passive and active 
formulas. Formulas start as passive. One passive formula is selected as the given 
clause.‘ Deletions and simplifications with respect to other passive and active 
formulas are then performed; for example, if the given clause is redundant with 


1 We keep the traditional name “given clause” even though our formulas are not nec- 
essarily clauses. 
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respect to an active formula, the given clause can be deleted, and if the given 
clause makes an active formula redundant, that formula can be deleted. More- 
over, simplifications can take place; for example, in a superposition prover, if 
the term order specifies b > a, the given clause is b ~ a, and p(b) is an active 
formula, the active formula can be simplified to p(a) and made passive again. 

Next, if the given clause has not been deleted, it is moved to the active 
set. All inferences between the given clause and formulas in the active set are 
then performed, and the resulting conclusions are put in the passive set. This 
procedure is repeated, starting with the selection of a new given clause, until the 
distinguished formula L is derived or the passive set is empty. 

The main metatheorem about this procedure states that if the given clause 
is chosen fairly (i-e., no passive formula is ignored forever), then the active set 
will be saturated (up to redundancy) at the limit. As a corollary, if the proof 
calculus is refutationally complete (i.e., it derives L from any inconsistent set), 
then the prover based on the calculus will be refutationally complete as well. 

We present an Isabelle/HOL [12] formalization of the Otter, DISCOUNT, 
iProver, and Zipperposition loops, culminating in a statement and proof of 
the main metatheorem for each one. We build on the pen-and-paper saturation 
framework developed by Waldmann, Tourret, Robillard, and Blanchette [18, 19] 
and formalized in Isabelle/HOL by Tourret and Blanchette [16]. The framework 
is an elaboration of Bachmair—Ganzinger-style saturation [3, Sect. 4]. Waldmann 
et al. include descriptions of the four “loops” as instances of the framework, as 
Examples 71, 74, 81, and 82 [19]; our formalization follows these descriptions. 

Among the four loops, the oldest one is the Otter loop. It originates from 
Otter, a resolution-based theorem prover for first-order logic introduced in 1988 
[11]. Otter was the first prover to present the given clause algorithm, in its 
simplest form as described above. 

The DISCOUNT loop followed a few years later as a byproduct of the DIS- 
COUNT system [7], itself built to distribute proof tasks among processors. What 
distinguishes a DISCOUNT loop is that it really treats the passive set as passive. 
Its formulas serve only as the pool from which to choose the next given clause; 
they are never involved in deletions or other simplifications. Another key differ- 
ence between the two loops is the decoupling of the scheduling of an inference 
and the production of its conclusion, which makes DISCOUNT able to propa- 
gate deletions and simplifications to discard inferences before their conclusions 
enter the passive set. For example, suppose that, in DISCOUNT, an inference 


p(x) V p(a) > p(y) V a(y) 
p(x) V q(y) 


called is scheduled, in a derivation using first-order resolution. Then suppose 
that, before ų is realized, p(a) is generated (e.g., as the result of the factorization 
of p(x) V p(a)). This triggers the deletion of p(x) V p(a), which has become 
redundant. Then s becomes an orphan inference, since one of its premises is no 
longer in the active set. It can be deleted without threatening the procedure’s 
completeness. In contrast, in an Otter loop, if ¿ is scheduled before p(a) is selected 


Verified Given Clause Procedures 63 


as the given clause, v’s conclusion p(x) V q(y) is directly added to the passive set, 
where it can be simplified. 

What we call the iProver loop [8] is an extension of the Otter loop with 
a transition that removes a formula C if C is made redundant by a formula 
set M. This terminology is from Waldmann et al. [19, Example 74]. This rule, 
introduced when iProver was extended to handle the superposition calculus [8], 
combines an inference step with a step that simplifies the active set. 

The last and most elaborate loop variant we present is the Zipperposition 
loop. Zipperposition is a higher-order theorem prover based on A-superposition 
[4]. Its given clause procedure is designed to work with higher-order logic. Due to 
the explosiveness of higher-order unification, a single pair of premises can yield 
infinitely many conclusions. For example, the higher-order resolution inference 


p(f(ya))Vqy —p(z(fa)) 
q (Aa. f(...(f.x)...)) 


where y and z are variables, produces infinitely many conclusions of the form 
q(Aa.f" x) for n € N. Thus, the passive set must be able to store possibly infinite 
sequences of lazily performed inferences. The Zipperposition loop was described 
by Vukmirović et al. [17] and by Waldmann et al. [19, Example 82].? Vukmirović 
et al. describe the loop’s implementation in Zipperposition, which we believe to 
be correct. In contrast, Waldmann et al. present an abstract version of the loop 
and connect it, via stepwise refinement, to their saturation framework, obtaining 
the main metatheorem. However, in the latter work, the details are not worked 
out. Thanks to the Isabelle formalization, we note and address several issues 
such that we now have a first rigorous—in fact, fully formal—presentation of 
the essence of the Zipperposition loop including the metatheorem. 

Our work is part of IsaFoL (Isabelle Formalization of Logic), an effort that 
aims at developing a formal library of results about logic and automated rea- 
soning [6]. The Isabelle files amount to about 7000 lines of code. They were 
developed using the 2022 edition of Isabelle and are available in the Archive of 
Formal Proofs (AFP) [5], where they are updated to follow Isabelle’s evolution. 

This work joins a long list of verifications of calculi and provers. We refer to 
Blanchette [6, Sect. 5] for an overview of such works. The most closely related 
works are the two proofs of completeness of Bachmair and Ganzinger’s resolu- 
tion prover RP, by Schlichtkrull, Blanchette, Traytel, and Waldmann [14] and 
by Tourret and Blanchette [16] as well as the proof of completeness of ordered 
(unfailing) completion by Hirokawa, Middeldorp, Sternagel, and Winkler [9]. 
Instead of focusing on a single prover, here we consider general prover architec- 
tures. Via refinement, our results can be applied to individual provers. 


2 Abstract Given Clause Procedures 


To prove the main metatheorem for each of the four loops, we build on the 
saturation framework. The framework defines two highly abstract given clause 


? Both groups of researchers include Blanchette and Tourret. 
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procedures, called GC (“given clause”) and LGC (“lazy given clause”) [19, Sect. 4]. 
They are formalized in the file Given _Clause_ Architectures.thy of the AFP 
entry Saturation_Framework [15]. 

GC is an idealized Otter-style loop. It operates on sets of labeled formulas. 
Formulas have the generic type ‘f, and labels have the generic type ‘I. One 
label, active, identifies active formulas, and the other labels correspond to passive 
formulas. GC is defined as a two-rule transition system ~¢c. In Isabelle syntax: 


inductive (~¢c) :: (‘f x 'l) set > (‘fx 'l) set > bool where 
process: Ny = NUM = No =NUM' = M C Rede (NU M’) => 

active subset W’ = 0 => Ni ~»cc No 

| infer: Ny = NU{(C,L)} = No=NU{(C,active)} U M => 
L # active => active subset M = = 
Inf _ between (fst ‘ active subset N) {C} 

C Red, (fst ‘ (N U {(C, active)} U M)) = 

Nice N2 


When presenting Isabelle code, we will focus on the main ideas and not 
explain all the Isabelle syntax or all the symbols that occur in the code. We 
refer to Waldmann et al. [19] for mathematical statements of the key concepts 
and to the Isabelle theory files for the formal definitions. 

Informally, the transition relation ~»¢c is defined as an inductive predicate 
equipped with two introduction rules, process and infer. Both rules allow a tran- 
sition from N, to Nə under some conditions: 


— The process rule replaces a subset M of N, by M’. This is possible only if 
the redundancy criterion (Redf) justifies the replacement and the formulas in 
M’ are all made passive (i.e., the active subset of M” is the empty set). This 
rule models formula simplification and deletion, but also replacing a passive 
label by another, “greater” passive label. 

— The infer rule makes a passive formula C active and performs all inferences 
between this formula and active formulas, yielding M. Strictly speaking, the 
inferences need not be performed at all; it suffices that M makes the inferences 
redundant. 


The main metatheorem for GC states that if the set of passive formulas is empty 
at the limit of a derivation, the active formula set is saturated at the limit. 

The lazy variant LGC generalizes the DISCOUNT loop. It operates on pairs 
(T, N), where T :: 'finference set is a set of inferences that have been scheduled 
but not yet performed and N :: (‘fx 'l) set is a set of labeled formulas. It consists 
of four rules that can be summarized as follows: 


— The process rule is essentially as in GC. It leaves the T component unchanged. 

— The schedule_ infer rule makes a passive formula active and schedules all the 
inferences between this formula and active formulas by adding them to the T 
component. 

— The compute_ infer rule actually performs a scheduled inference or otherwise 
ensures that it is made redundant by adding suitable formulas. 
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— The delete_ orphan_ infers rule can be used to delete a scheduled inference if 
one of its premises has been deleted. 


The main metatheorem for LGC states that if the set of scheduled inferences and 
the set of passive formulas are empty at the limit of a derivation starting in an 
initial state, the active formula set is saturated at the limit. 


3 Otter and iProver Loops 


The Otter loop [10] works on five-tuples (NV, X, P, Y, A), where N is the set 
of new formulas; X is a subsingleton (i.e., the empty set or a singleton {C}) 
storing a formula moving from N to P; P is the set of so-called passive formulas 
(although, strictly speaking, the formulas in N, X, and Y are also passive); Y 
is a subsingleton storing the given clause, which moves from P to A; and A is 
the set of active formulas. All the sets are finite in practice. 

Initial states have the form (N,0,0,0,0). Inferences are assumed to be fini- 
tary, meaning that the set of inferences with C and formulas from A as premises 
(formally written Inf_ between A {C}) is finite if A is finite. Premiseless infer- 
ences are disallowed. 


Otter Loop without Fairness. The first version of the Otter loop, formalized 
in Otter_Loop.thy, does not make any fairness assumption on the choice of the 
given clause. The guarantee it offers is correspondingly weak: If the sets N, X, 
P, and Y are empty at the limit of a derivation starting in an initial state, then 
A is saturated. But there is no guarantee that N, X, P, and Y are empty at the 
limit. Later in this section, we will show how to ensure this generically. 

The transition system ~o, for the Otter loop without fairness is as follows: 


inductive (~oL) :: (f x OL_ label) set > (‘f x OL_ label) set > bool where 
choose_n: C ¢ N => 

state (N U {C}, 0, P, 0, A) ~or state (N, {C}, P, 0, A) 
delete_ fwd: C € Redr (P U A) v (aC € PU A. Œ < C) => 

state (N, {C}, P, 0, A) ~o state (N, 0, P, 0, A) 
simplify_ fwd: C € Rede (P U AU 1{0"}) = 

state (N, {0}, P, 0, A) ~o state (N, {C’}, P, 0, A) 
delete_bwd_p: C’ € Rede {0} VCO XC" = 

state (N, {C}, PU {0"}, 0, A) ~or state (N, {C}, P, 0, A) 
simplify_bwd_p: C’ € Rede C, C” => 

state (N, {C}, PU{C’}, 0, A) ~or state (N U {C”}, {C}, P, 0, A) 
delete_bwd_a: C’ € Rede {C} VC XC" => 

state (N, {C}, P, 0, AU {C’}) wor state (N, {C}, P, Ø, A) 
simplify_bwd_a: C’ € Rede (C, C”) = 

state (N, {C}, P, 0, AU {0"}) ~or state (N U {C”}, {C}, P, 0, A) 
transfer: state (N, {C}, P, 0, A) ~or state (N, Ø, P U {C}, Ø, A) 
choose_p: C ¢ P => 

state (0,0, P U {C}, Ø, A) ~o state (0, 0, P, {C}, A) 
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| infer: Inf_ between A {C} C Red) (AU {C} U M) = 
state (0, 0, P, {C}, A) ~or state (M, 0, P, 0, AU {C}) 


The state function converts a five-tuple into a set of labeled formulas—an equiv- 
alent representation that is often more convenient formally. The labels are New 
(for N), XX (for X), Passive (for P), YY (for Y), and Active (for A, corresponding 
to active in GC). 

The first nine rules all refine GC’s process rule, whereas the tenth rule, infer, 
refines GC’s infer. More precisely: The first rule moves a formula from N to X. 
The second and third rules delete or simplify the formula in X. The fourth to 
seventh rules delete or simplify other formulas using the formula in X. The eight 
rule moves a formula from X to P. The ninth rule moves a formula from P to 
Y. And the tenth rule moves a formula from Y to A and performs all inferences 
with formulas in A or otherwise ensures that the inferences are made redundant. 

Following Waldmann et al., the rules introducing new formulas—namely, the 
simplify rules and infer—allow adding arbitrary formulas to the state and are 
therefore not sound. Since the metatheorems are about completeness, there is no 
harm in allowing unsound steps, such as skolemization. If desired, soundness can 
be required simply by adding the assumption N | N’ for each step N ~ol N’ 
in a derivation. 

Compared with most descriptions of the Otter loop in the literature, the 
above formalization (and Example 71 in Waldmann et al. [19] on which it is 
based) is abstract and nondeterministic, allowing arbitrary interleavings of dele- 
tions, simplifications, and inferences. By not commiting to a specific strategy, 
we keep our code widely applicable: Our abstract Otter loop can be used as 
the basis of refinement steps targetting a wide range of deterministic procedures 
implementing specific strategies. We will see the same approach used for all the 
loops. We note that Bachmair and Ganzinger made a similar choice for their 
ordered resolution prover RP [3, Sect. 4]. 


Otter Loop with Fairness. Below we introduce a fair version of the Otter loop, 
called ~of and formalized in Fair Otter Loop _Def.thy. This new version is 
closer to an implementation. 


inductive (~ort) :: (‘p,’f) OLf _ state => ('p,'f) OLf _ state > bool where 
choose_n: C ¢ N => 
(N U {0}, None, P, None, A) wort (N, Some C, P, None, A) 
delete _ fwd: C € Redp (elems P U A) v (3C” € elems P U A.C’ = C) = 
(N, Some C, P, None, A) wort (N, None, P, None, A) 
simplify_ fwd: C’ xs C => C € Redp (elems P U AU {C’}) = 
(N, Some C, P, None, A) wort (N, Some C’, P, None, A) 


choose_ p: P + empty => 
(0, None, P, None, A) ~of 
(0, None, remove (select P) P, Some (select P), A) 
infer: Inf_ between A {C} C Red) (A U {C} U M) = 
(0, None, P, Some C, A) ~ort (M, None, P, None, A U {C}) 
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The definition of ~or differs from that of ~o, in two main respects: 


— The set P is organized as some form of queue, with operations such as select, 
which chooses the queue’s next element; remove, which removes all occur- 
rences of an element from the queue; and elems, which returns the set of the 
queue’s elements. The queue is assumed to be fair, meaning that if select is 
called infinitely often, every element in the queue will eventually be chosen 
and the limit of P will be empty. 

— Simplification (e.g., in simplify_ fwd) is allowed only if the simplified formula 
C’ is smaller than the original formula C according to some given well-founded 
order <s. In practice, simplifications are usually well founded, so this is not 
a severe restriction. 


Also note that the state is now directly represented as a five-tuple (without the 
mediation of the state function), where the subsingletons are of type 'f option, 
with values of the forms None and Some C. 


Formula Queue. The queue that represents the passive formula set P is for- 
malized in its own file, Prover_ Queue.thy. The file defines an abstract type of 
queue and the operations on it (empty, select, add, remove, and elems). It also 
expresses the fairness assumption on the select function: 


If a sequence of queue operations starting from an empty queue contains 
infinitely many removals of the selected element, then the queue is empty 
at the limit. 


Moreover, the file contains an example implementation of the queue as a 
FIFO queue. This ensures that the abstract requirements on the queue, including 
fairness, are satisfiable. 


iProver Loop with Fairness. To obtain an iProver loop from an Otter loop, 
only one extra rule is needed. The fair version of the iProver loop is formalized 
in Fair_iProver_Loop.thy as follows: 


inductive (~ı¢) :: (‘p,'f) OLf _ state = ('p,'f) OLf_ state > bool where 
ol: St ~~ OLt St! => St ILE St’ 
| red_by_ children: C € Rede (AU M) V M={C} AC! = C = 
(0, None, P, Some C, A) ~»11¢ (M, None, P, None, A) 


The first rule, ol, executes any ~of rule as an iProver loop rule. The second 
rule, red_by_ children, replaces a formula C' by a set of formulas M that make 
it redundant. As M, iProver would use a set of simplified formulas produced by 
inferences with C as a premise and formulas from AU {C} as further premises. 
The rule is stated in a more general, unsound form. 

We prove the main metatheorem first for the iProver loop. Then, since 
an Otter derivation is an iProver derivation (in which the second rule, 
red_by_ children, is not used), the result carries over directly to the Otter loop. 
The Isabelle statement, located in Fair_iProver_Loop.thy, is as follows: 
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theorem fair_IL_ Liminf_ saturated 
assumes 
full chain (~f) Sts and 
is_initial OLf_ state (Sts ! 0) 
shows saturated (Liminf Sts) 


Informally, this states that if Sts is a complete ~>ııf¢ derivation starting in a 
state of the form (N, None, empty, None, Ø), then the limit is saturated. The limit 
(strictly speaking, limit inferior) is defined by 


ee _ T 
Liminf Xs DE (eee Xs! j 


where Xs ! j returns the element at index j of the finite or infinite sequence Xs. 
In Isabelle, such sequences are represented by the type ‘a llist of “lazy lists.” 

This metatheorem is proved within the scope of the passive set queue’s fair- 
ness assumption. It is derived from the metatheorem about the transition system 
~L without fairness, which is inherited from the abstract procedure GC. 


Proof Sketch. The main difficulty is to show that N, X, P, and Y are empty at 
the limit. Once this is shown, we can apply the main metatheorem for GC, which 
states that if there are no passive formulas at the limit, the active formula set is 
saturated. 

Let Sto ~ Stı ~m +- be a complete derivation, where St; = 
(Ni, Xi, Pi, Yi, Aj). If the derivation is finite, it is easy to show that the final 
state, and hence the limit, must be of the form (Ø, None, empty, None, A), as 
desired. 

Otherwise, for an infinite derivation, we assume in turn that the limit of N, X, 
P, or Y is nonempty and show that this leads to a contradiction. We start with 
N. Let i be an index such that N; N N;+1NO--- 40, which exists by the definition 
of limit. This means that Ni, Ni+1,... are all nonempty. By invariance, we can 
show that Y;, Yi+1,... are all empty. Thus, if we have a transition from Stj to 
St;+41 for j > i, it cannot be infer (via ol) or red_by_ children. It can be shown 
that for the remaining transition rules, we have St; 2, St;41 I1 +++, where 2 is 
the converse of the lexicographic combination C1 of three well-founded relations: 


— the multiset extension <s of <s on entire states—i.e., on unions N U X U 
PUY UA; 

— as a tiebreaker, <s on N components; 

— as a further tiebreaker, <s on X components. 


Since the lexicographic combination of well-founded relations is well founded, 
the chain St; Iı Stj41 Iı +- cannot be infinite, a contradiction. 

Next, we consider the X component. If X is nonempty forever, the only 
possible transition rules are deletions and simplifications, and both make the 
entire state decrease with respect to <s. Again, we get a contradiction. 

Next, we consider the P component. The fairness assumption for the queue 
guarantees that P is empty at the limit, at the condition that the choose _ p rule 
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is executed infinitely often. Since P is assumed not to be empty at the limit, 
choose_ p must be executed only finitely often. Let i be an index from which no 
choose_p step takes place. We then have St; 2 St;41 I2 +++, where Zə is the 
converse of the lexicographic combination C2 of two well-founded relations: 


— <s on Y components; 
— as a tiebreaker, the relation C1 on entire states. 


Again, we get a contradiction. 

Finally, for Y, the only two transitions possible, infer and red_ by_ children, 
are to a state where Y is empty afterward, contradicting the hypothesis that Y 
is nonempty forever. 


4 DISCOUNT Loop 


The DISCOUNT loop [1] works on four-tuples (T, P, Y, A), where T is the set of 
scheduled (“to do”) inferences, P is the set of so-called passive formulas (although, 
strictly speaking, any formula in Y is also passive); Y is a subsingleton storing 
the given clause; and A is the set of active formulas. All the sets are finite. 

Initial states have the form (Ø, P, ø, Ø). Inferences are assumed to be finitary. 
We disallow premiseless inferences. Waldmann et al. [19, Example 81] allow 
them and let the T component of initial sets consist of all of them. However, 
in their “reasonable strategy,” they implicitly assume that T is finite, in which 
case premiseless inferences can be immediately performed and replaced by the 
resulting formulas inserted in P. 


DISCOUNT Loop without Fairness. The first version of the DISCOUNT 
loop, formalized in DISCOUNT Loop.thy, does not make any fairness assump- 
tion on the choice of the inference to compute or the given clause. There is no 
guarantee that T, P, and Y are empty at the limit, but if they are, then A is sat- 
urated at the limit. Here is an extract of the definition, omitting the delete_ bwd 
and simplify_ fwd rules: 


inductive (~>pL) :: 'f inference set x ('f x DL_ label) set > 
'f inference set x (‘f x DL_ label) set = bool 
where 
compute_ infer: ı € Red) (AU{C}) = 
state (T Uz, P, 0, A) ~p state (T, P, {C}, A) 
choose_ p: state (T, P U {C}, 0, A) ~n state (T, P, {C}, A) 
delete_ fwd: C € Redp AV (AC’ € A.C’ < C) = 
state (T, P, {C}, A) ~nu state (T, P, 0, A) 


simplify_bwd: C’ € Rede {C,C”} => 

state (T, P, C, AU {0"}) ~nu state (T, P U {C”}, {C}, A) 
schedule_ infer: T’ = Inf _ between A {0} = 

state (T, P, {C}, A) ~nu state (T U T’, P, 0, AU {C}) 
delete_orphan_ infers: T’ N |Inf_ from A = @ => 

state (T UT’, P, Y, A) ~p state (T, P, Y, A) 
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The state function converts a four-tuple (T, P, Y, A) into a pair (T, N), where N 
is a set of labeled formulas. The labels are Passive (for P), YY (for Y), and Active 
(for A, corresponding to active in LGC). The rules compute_ infer, schedule_ infer, 
and delete_ orphan_ infers refine the LGC rules of the same names; the other rules 
refine process. 


DISCOUNT Loop with Fairness. In the fair version of the DISCOUNT 
loop, formalized in Fair. DISCOUNT _Loop.thy, the scheduled inferences and 
the passive formulas are organized as a single queue. A state is then a triple 
(P,Y, A), where P is the single queue that merges T and P from the above 
DISCOUNT loop, and Y and A are as above. Elements of P have the forms 
Passive_ Inference u and Passive Formula C. The select function of P is assumed 
to be fair: If select is called infinitely often, every element in the queue will 
eventually be chosen and the limit of P will be empty. 

The definition of the transition system is as follows: 


inductive (~pi¢) :: (p, 'f) DLf _ state => ('p,'f) DLf _ state > bool where 
compute_ infer: P A empty => select P = Passive_Inference, => 
l € Red) (AUC) = 
(P, None, A) ~pi¢ (remove (select P) P, Some C, A) 
choose_p: P # empty => select P = Passive Formula C => 
(P, None, A) ~pi¢ (remove (select P) P, Some C, A) 
delete_ fwd: C € Rede AV (3C' € A.C’ $C) => 
(P, Some C, A) ~prir (P, None, A) 


simplify. bwd: C’ € A => C” <s Cl => C" € Red {C,C”} => 
(P, Some C, AU{C’}) ~>pı¢ (add (Passive _ Formula C”) P, Some C, A) 
schedule_ infer: set us = Inf _ between A {0} => 
(P, Some C, A) ~~ DLF 
(fold (add o Passive_ Inference) ¿s P, None, A U {C}) 
delete_orphan_ infers: us # || => set vs C passive_inferences_of P => 
set ¿s N Inf _from A = => 
(P, Y, A) ~p (fold (remove o Passive_ Inference) us P, Y, A) 


We note the following: 


— Inferences are added to P by schedule_infer. An inference can be deleted 
by delete _ orphan _ infers if one of the premises has been removed since the 
inference was scheduled. 

— The next element from P is chosen by compute _ infer or choose _ p, depending 
on whether it is of the form Passive_ Inference ¿ or Passive Formula C. 

— Formulas are added to P by simplify _ bwd. 


As with the Otter and iProver loops, the most important result is saturation 
at the limit: 
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theorem fair_DL_ Liminf_ saturated 
assumes 
full chain (~+p.¢) Sts and 
is_ initial DLf_ state (Sts ! 0) 
shows saturated (labeled _formulas_ of (Liminf_ fstate Sts)) 


Proof Sketch. The proof amounts to showing that the sets P and Y are empty 
at the limit. This is easy to show for finite derivations, so we focus on infinite 
ones. We proceed by contradiction. For P, the fairness assumption for the select 
function of the queue guarantees that P is empty at the limit, at the condition 
that the compute_ infer and choose_p rules are collectively executed infinitely 
often. Since P is assumed not to be empty at the limit, these two rules must 
be executed only finitely often. Let 7 be an index from which no compute_ infer 
or choose_p step takes place. We then have St; 3 Sti+ı I++, where 3 is the 
converse of the lexicographic combination C of two well-founded relations: 


— < on the cardinality of Y components (0 or 1); 
— as a tiebreaker, the multiset extension <s of <s on unions PUY UA. 


Since the lexicographic combination of well-founded relations is well founded, 
the chain St; 3 St;4; I++- cannot be infinite, a contradiction. 

Finally, we consider Y. If Y is nonempty forever, the only possible transitions 
make the entire state decrease with respect to C. This yields a contradiction. 


5 Zipperposition Loop 


The Zipperposition loop [17] as described by Waldmann et al. [19, Example 82] 
works on four-tuples (T, P, Y, A), where the components have the same roles as 
in the DISCOUNT loop: T is the scheduled set, P is the passive set, Y is the 
given clause, if any, and A is the active set. For technical reasons, we need to 
enrich the state with a ghost component D (“done”), of type ‘f inference set, 
resulting in a five-tuple (T, D, P, Y, A). All the sets are finite. 

The hallmark of the Zipperposition loop is that it can handle infinitary infer- 
ences. We assume that Inf_ between A {C} is countable if A is finite. (This 
assumption is implicit in Waldmann et al.) To store the infinitely many con- 
clusions of an inference, T contains possibly infinite sequences of inferences, 
instead of individual inferences. Premiseless inferences are also allowed. Initial 
states have the form (T, P, 0,0, Ø), where T contains all the premiseless inferences 
of the underlying proof calculus and only those. 

The implementation in Zipperposition by Vukmirović et al. [17] deviates from 
Waldmann et al. in one important respect: Instead of sequences of inferences, 
Zipperposition works with sequences of subsingletons of inferences. The special 
value Ø is returned when no progress is made in computing an inference, to give 
control back to the given clause procedure. In the setting of Waldmann et al., 
this special value can be replaced by a tautology (e.g., T or T ~ T), which the 
given clause procedure can delete as redundant. 
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Zipperposition Loop without Fairness. The first version of the Zipperpo- 
sition loop, formalized in Zipperposition_Loop.thy, does not make any fairness 
assumption on the choice of the inference to compute or the given clause. Here 
is an extract of the definition: 


inductive (~z) :: 'f inference set x ('f x DL_ label) set > 
'f inference set x ('f x DL_ label) set > bool 
where 
compute_ infer: to € Redi (AU {C}) = 
zl_ state (T + {LCons tp us}, D, P, 0, A) 2. 
zl_ state (T + {us}, D U {t0}, P U {C}, Ø, A) 
choose_p: zl_state (T, D, P U {0}, 0, A) ~zı zl_state (T, D, P, 
{0}, 4) 
delete _ fwd: C € Rede AV (3C' € A.C’ $C) => 
zl_state (T, D, P, C, A) ~»z, zl_state (T, D, P 


, 0, A) 


schedule_ infer: inferences_of T’ = Inf _ between A {C} => 
zl_state (T, D, P, C, A) ~zL 
zl_state (T + T’, D — inferences_of T’, P, 0, AU {0} 
delete_ orphan_ infers: set ıs N Inf _from A = => 
zl_state (T + {vs}, D, P, Y, A) ~z, zl_state (T, D U set ıs, P, Y, A) 


The zl_ state function converts a five-tuple (T, D, P, Y, A) into a pair (U, N), 
where 


— U consists of all the inferences contained in T minus those in D (formally 
written inferences_of T — D); and 
— N is a set of labeled formulas corresponding to P, Y, and A. 


We use a multiset for the T component. Waldmann et al. use a set, but this is 
not very realistic because an implementation cannot in general detect duplicate 
infinite sequences. 

The D component addresses a subtle issue in Waldmann et al. If we did not 
subtract D in the definition of U, the completeness theorem we would obtain 
from the LGC layer above would require the T component to be empty at the 
limit. However, a given inference 4 might appear in T multiple times and hence 
always be present, even if we keep on removing copies of it, if new copies are con- 
tinuously added. The issue goes away if we add ų to D whenever we compute it, in 
compute_ infer—then the inference is not present in U (i.e., inferences_of T— D). 
In other words, computing an inference makes it momentarily disappear, even if 
there are multiple copies of it in T. 

Admittedly, it is not easy to develop a robust intuitive understanding of how 
D works, but what matters ultimately is that D allows us to obtain a usable main 
metatheorem. The metatheorem states that if the set of scheduled inferences and 
the set of passive formulas are empty at the limit of a derivation starting in an 
initial state, the active formula set is saturated at the limit. We will also see, via 


Verified Given Clause Procedures 73 


an additional refinement layer, that the ghost component is truly a ghost and 
can be omitted once it has served its purpose. 


Zipperposition Loop with Fairness. Unlike the fair DISCOUNT loop, the 
fair Zipperposition loop, formalized in Fair_ Zipperposition_Loop.thy, keeps T 
and P separate. An extract of the Isabelle definition follows: 


inductive (~»z:¢) :: ('t,p,'f) ZLf_ state = ('t,'p,'f) ZLf_ state = bool 
where 
compute_ infer: (Avs € t_llists T. ¿s A LNil) => 
t_pick elem T = (to, T’) => to € Red) (AU {C}) => 
(T, D, P, None, A) ~z (T’, DU {to}, p_add C P, None, A) 
| choose_p: P#p_empty = 
(T, D, P, None, A) ~ZLF 
(T, D, p_remove (p_ select P) P, Some (p_ select P), A) 
| delete_ fwd: C € Rede AV (SC’ € A.C’ $C) = 
(T, D, P, Some C, A) ~z (T, D, P, None, A) 


| schedule_ infer: inferences_of uss = Inf _ between A {0} => 
(T, D, P, Some C, A) ~z1¢ 
(fold t_add_llist ¿ss T, D — inferences_of uss, P, None, A U {C}) 

| delete_orphan_ infers: ıs € t_llists T => set ¿s N Inf_ from A= => 
(T, D, P, Y, A) ~z (t_remove_llist ¿s T, D U set us, P, Y, A) 


The presence of two queues introduces some complications. Waldmann et 
al. [19, Example 82] claim that “to produce fair derivations, a prover needs 
to choose the sequence in Computelnfer fairly and to choose the formula in 
ChooseP fairly.” However, this does not suffice: A counterexample would apply 
compute_ infer infinitely often in a fair fashion, retrieving elements from some 
infinite sequences, without ever applying choose_p (whose choice of formula 
would then be vacuously fair). The solution is to add a fairness assumption stat- 
ing that compute _ infer is applied at most finitely many times before choose_p 
is applied—or, in other words, that if compute_ infer is applied infinitely often, 
then so is choose_p. This leads to the following main metatheorem: 


theorem fair_ZL_ Liminf_ saturated: 
assumes 
full chain (~>zıf) Sts and 
is_initial ZLf state (Sts ! 0) and 
infinitely often compute_infer_ step Sts —> 
infinitely often choose_p_ step Sts 
shows saturated (labeled formulas_of (Liminf_zl_fstate Sts)) 


Proof Sketch. Recall that zl_ state maps (T, D, P, Y, A) to a pair (U, N). In the 
abstract LGC layer, U and the passive subset of N are required to be empty at 
the limit. To obtain the same effect in ~z,,, we must show that the sets U, P, 
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and Y are empty at the limit. This is easy to show for finite derivations, so we 
focus on infinite ones. We proceed by contradiction. 

We start with U. We first show that there must be infinitely many com- 
pute_ infer steps. Assume that there are finitely many. Then there exists an 
index 7 from which no more compute_infer steps take place. We then have 
St; I Sti41 O---, where J is the converse of the lexicographic combination C 
of four well-founded relations: 


— the multiset extension <s of <s on unions PUY U A; 

— as a tiebreaker, <s on P components; 

— as a further tiebreaker, <s on Y components; 

as a further tiebreaker, < on the cardinality of T components. 


We get a contradiction. Having shown that there are infinitely many com- 
pute_ infer steps, we exploit the queue’s fairness to show that one of these steps 
will choose any given inference ų from the queue. Thanks to the D trick, v will 
then momentarily vanish from U, ensuring that it is not in the limit. The same 
argument applies for any inference 1, showing that U is empty at the limit. 

Next, we show that P is empty at the limit. We start by showing that there 
must be infinitely many choose_p steps. Assume that there are finitely many. 
Then, by the third assumption, there must be finitely many compute_ infer steps 
as well. Let 7 be an index from which no more compute_ infer steps take place. 
We then have St; I Sti}ı I++, as above, yielding a contradiction. 

Finally, we show that Y is empty at the limit. Let i be an index such that 
Y; A Yi41N--- #0. Since a compute_infer step is possible only if Y is empty, 
no such steps are possible from index i. Again, we have St; I Stip I,a 
contradiction. 


Queue of Formula Sequences. The queue data structure used for the T com- 
ponent of the Zipperposition loop needs to store a finite number of possibly infi- 
nite sequences of inferences. It is formalized in Prover_ Lazy List Queue.thy. 
It provides the following operations on abstract queue and element types 'q and 
'e: 
fixes 

empty :: q and 

add_llist :: 'e list > 'q = 'q and 

remove_llist :: ʻe list > 'q > ‘q and 

pick_elem :: ‘g => 'e x 'q and 

llists :: ‘q > 'e llist multiset 


The fairness requirement on implementations of the abstract queue interface 
takes the following form: 


If a sequence of queue operations contains infinitely many pick_ elem steps 
and ų is at the head of one of the sequences stored in the queue, then 
either the sequence will be entirely removed (by orphan deletion) or ¢ will 
eventually be chosen. 
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A syntactically stronger formulation of fairness, where 1 may occur anywhere in 
a sequence, is derived as a corollary: 


If a sequence of queue operations contains infinitely many pick_ elem steps 
and ų occurs in one of the sequences stored in the queue at some index 
in the sequence, then either the sequence (possibly amputated from its 
leading elements) will be entirely removed or ų will eventually be chosen. 


As a proof of concept, the theory file contains an example implementation of 
the queue as a FIFO queue. The proof that this FIFO queue is fair is the most 
finicky proof of our entire development. 


Zipperposition Loop without Ghost Fields. In the last step of our devel- 
opment, we remove the D state component. D is useful to retrieve a usable 
main metatheorem for ~z, but it is not explicitly referenced in the metathe- 
orem for the fair variant ~»z,¢. The resulting transition system ~>zıfw, for- 
malized in Fair_ Zipperposition_Loop_ without _Ghosts.thy, operates on four- 
tuples (T, P, Y, A). Each transition is identical to the corresponding ~>zı¢ tran- 
sition, omitting the D component. The main metatheorem is also essentially the 
same. 


6 Conclusion 


We presented an Isabelle/HOL formalization of four variants of the given clause 
procedure, starting from Tourret and Blanchette’s formalization of two abstract 
given clause procedures [16]. We relied extensively on stepwise refinement to 
derive properties of more concrete transition systems from more abstract ones. 

Our main findings concern the Zipperposition loop. We found that the refine- 
ment proof is not as straightforward as previously thought [19, Example 82] and 
requires a nontrivial abstraction function. In addition, we discovered a fairness 
condition—the necessity of avoiding computing inferences forever without select- 
ing a formula—that was not mentioned before in the literature, and we clarified 
other fine points. 
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Abstract. This paper presents and proves totally correct a new algo- 
rithm, called QSMA, for the satisfiability of a quantified formula modulo 
a complete theory and an initial assignment. The optimized variant of 
QSMA implemented in YicesQS is described and shown to preserve total 
correctness. A report on the performance of YicesQS at the 2022 SMT 
competition is included. YicesQS ran in the LIA, NIA, LRA, NRA, and BV 
categories and ranked second for the “largest contribution” award (single 
queries). It was the only solver to solve all LRA instances, where it was 
about two orders of magnitude faster than the second best solver (Z3). 


1 Introduction 


Applications of automated reasoning generate formulas involving both quanti- 
fiers and symbols defined in background theories. For example, software verifica- 
tion needs reasoners that decide the satisfiability of quantified formulas modulo 
theories such as data structures and arithmetic (e.g., [20]). Therefore, endowing 
SMT solvers with quantifier reasoning (e.g., [3,9, 11-14, 22]), enriching first-order 
theorem provers with built-in theories (e.g., [1,2,19]), and integrating provers 
and solvers [7], are major research objectives. 

If there is a single background theory 7, the T-satisfiability of quantified 
formulas can be reduced to that of quantifier-free formulas if 7 admits quantifier 
elimination (QE): for every formula y there exists a quantifier-free formula F 
that is T-equivalent to p. Since computing F can be prohibitively expensive 
(e.g., exponential in linear rational arithmetic (LRA) and doubly exponential in 
linear integer arithmetic (LIA) [8]), QE is not a practical solution. 

In this paper we propose a practical solution in the form of a new algo- 
rithm called QSMA. In QSMA the computation of quantifier-free model-based 
under-approzimations (MBU) and model-based over-approximations (MBO) of 
quantified formulas embodies a lazy approach to QE, which is tailored for 


© The Author(s) 2023 
B. Pientka and C. Tinelli (Eds.): CADE 2023, LNAI 14132, pp. 78-95, 2023. 
https: //doi.org/10.1007/978-3-031-38499-8_5 


QSMA 79 


T-satisfiability. MBU generates a quantifier-free implicant of the given formula 
that is true in the given model. Model(-guided) generalization for linear [12] and 
nonlinear real arithmetic (NRA) [17] is an instance of MBU. MBO generates a 
quantifier-free implied formula that is false in the given model. Model interpola- 
tion for NRA [17] is an instance of MBO. 

The QSMA algorithm assumes that the theory T is complete. By its recur- 
sive nature, QSMA solves a generalized form of the satisfiability problem, called 
quantified SMA (satisfiability modulo theory and assignment): given a formula p 
with arbitrary quantification, and an initial assignment to Boolean or first-order 
subterms of y, find a theory model of y that extends the initial assignment, or 
report that none exists. In addition to QSMA and its total correctness, we present 
an optimized variant named OptiQSMA, which preserves total correctness and 
is implemented in the YicesQS solver built on top of Yices 2. A report on exper- 
imental results from the 2022 SMT competition and a discussion complete the 
paper. We begin with a high-level view of QSMA. 


1.1 High-Level View of the QSMA Algorithm 


The QSMA algorithm works by progressively instantiating quantified variables. 
Consider a formula y of the form 3%1.V%2.47%3...F[%1,£2,%3,...] where F is 
quantifier-free. For example, suppose the theory is LRA, y = Jda.Vy.dz.F and 
F=2>0An>0Ay+2 > 0. Say that QSMA assigns 70. Whatever 
value is chosen for y, the algorithm can show that ọ is true in LRA by assigning 
z—max(0, —y). If F = z>0Ax > 0Ay+z < 0, no matter which (non-negative) 
value QSMA chooses for x, it can show that y is false in LRA by picking y<1, 
because there is no value for z that satisfies z >OAz< —1. 

For an example that is not in prenex normal form, consider a formula y of 
the form Ja.((Vyi-Fi[x, yi]) > (VYy2.Fə[z, y2])), where Fı and F> are quantifier- 
free. QSMA sees the formula as Ja.((Sy1.4Fi[2, y1]) V (7Sy2.4F2[2, y2])), and 
then as 4a.(p; V >p2), where pı and pz are proxy Boolean variables for the 
quantified subformulas. QSMA assigns values to x, pı, and po. If pı is assigned 
true, the algorithm tries to extend the assignment with a value for yı that satisfies 
iF, |x, yi]. If p2 is assigned false, the algorithm tries to show that there is no 
value for yz that satisfies =F [x, yo]. 

Without loss of generality (~~ converts V into =J=), we consider formulas 


F|z, Z, p] denotes a quantifier-free formula where the variables z, z, and p occur. 
Tuples Z and g contain the first-order variables occurring free in F. Formula 
F is quantifier-free because the quantified subformulas y; = 4y;.G;[Z,Z, Jı] are 
replaced by proxy Boolean variables p = pı,...pp. Given an initial assignment 
to the free variables Z, we construct a QSMA-tree for y. QSMA starts trying to 
satisfy F'[Z, z, p]. If it fails, it means that y is false under the initial assignment. 
If it succeeds, there are two cases. If k = 0, formula y is true under the initial 
assignment. If k > 0, the algorithm descends recursively to consider the QSMA- 
subtrees for the p; subformulas (1 < i < k). If QSMA assigned true to pj, it 
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tries to show that y; is true. If QSMA assigned false to p;, it tries to show that 
i is false. If it succeeds for all QSMA-subtrees, formula y is true under the 
initial assignment. For this, the model built by QSMA should satisfy F'[Z, z, p| A 
Aj (Pi & yi). Otherwise, formula ¢ is false under the initial assignment. 


2 Preliminaries 


A signature X is given by a set S of sorts and a set of sorted symbols. Given 
a class V = (V%)seg of disjoint sets of sorted variables, X[V]-formulas, X- 
sentences, and XV ]-interpretations are defined as usual. A X-structure is a 
»7|@]-interpretation. We use x, y, z for first-order variables, p for Boolean ones, 
and 7, J, Z, and p for tuples of such variables. We also use y and w for formulas, F 
and G for quantifier-free formulas, M for interpretations, = for satisfaction and 
entailment, = for identity, W for disjoint union, and \ for set difference. FV (y) is 
the set of the variables occurring free in y. Slightly abusing the notation, FV (y) 
is also treated as a tuple. Implication is written = and logical equivalence is 
written ©. If V C V, (i.e., Vi C V$ for all s € S), a L[V,]-interpretation Mə is 
an ezrtension of a X[V;]-interpretation Mı to Və, if Mz interprets the variables 
in V$ \ Vf for all s € S and is otherwise identical to My. 

A theory T is defined by a signature X and a set of X-sentences called 7- 
axioms. A model of T, or T-model, is a X-structure that satisfies the T-axioms. A 
T[V|-model is a X[V]-interpretation that is a T-model when the interpretation 
of variables is ignored. A theory T is complete, if it is consistent, and for all 
S/-sentences F, either F or ~F is provable from the T-axioms. In this paper 
we deal with a single theory 7 that has a unique Z-model Mo, so that the 
interpretation of everything except variables is fixed. Therefore 7 is complete, 
for X-sentences T-validity, T-satisfiability, and truth in Mo coincide, all T[V]- 
models are extensions of Mo, and a T-satisfiability procedure is concerned only 
with assignments to variables. Since there are one theory and one signature, 
we write formula for »'[V]-formula and model for T-model or T[V]-model. A 
conservative theory extension T+ of T adds to X special constants, called values, 
to name elements in the domain of Mo as needed. Conservative means that a 
T-satisfiable formula is also T+-satisfiable. 

The quantified SMA problem for theory T asks whether Mo = ọ for an 
arbitrary formula y and an initial assignment of values to the variables in FV (y). 
Formulas have the form y = 3%.F[2, 2, p]{pi—3yi-Gi[zZ, Z, gil }4_, described in 
the introduction, where FV(y) = Z and quantified variables are standardized 
apart. If FV(y) = @, we still have SMA problems when considering subformulas 
under an assignment to existentially quantified variables. 


3 The QSMA Framework 


The QSMA algorithm works with a tree representation of a formula y. A node n 
in the tree is labeled with a pair (z, F), where Z is a tuple of first-order variables, 
called the local variables of n, and F is a quantifier-free formula. The local 
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variables are implicitly existentially quantified: they are existentially quantified 
variables whose quantifers have been stripped, so that they are locally free, so 
to speak, and can be assigned by the algorithm. An arc from a node n to a child 
node b is labeled with a Boolean variable p. This Boolean variable stands as a 
proxy for the quantified subformula represented by the subtree rooted at node 
b. Therefore, the Boolean variable p is also considered a proxy of b itself. 

A formula y may have free variables FV (p) = Z, whose assignment is given 
initially as part of the SMA problem instance. These variables are called rigid, 
because their assignments do not change during the tree traversal. As the algo- 
rithm traverses the tree, the local variables of a node n are rigid from the point 
of view of a child node b: their assignments do not change during the traversal 
of the subtree rooted at b. Therefore, we represent a formula y as a pair formed 
by a tuple of rigid variables and a labeled tree. Slightly abusing the terminology, 
we call this pair a QSMA-tree. The root of a tree T is denoted root(T). 


Definition 1 (QSMA-tree). Given p = Jz.F |Z, z, p\{pi—Jy;.G,[2Z, z, yi}, 
where FV (p) = Z and pi = 3yi-Gi|z, z, Gi], 1 < i < k, the QSMA-tree for is 
the pair G = (2,T), where Z is called the tuple of the rigid variables of G, and T 
is a labeled tree defined inductively as follows: 


- Ifk =0, T consists of a single node r labeled (z, F|z,z]); 

- If k > 0, for alli, 1 <i < k, let Gi = ((Z,%),T;) be the QSMA-tree for pi, 
where root(T;) is a node b; labeled (Ji, Gi[Z, Z, ji]). Then T is the tree with a 
new node r labeled (z, F'|Z,Z,p|) as root, k outgoing arcs labeled p1,...,Dx, 
and b,,...,b,% as children. 


If subformula p; occurs more than once in y, the same proxy variable p; 
is used for all occurrences. The ancestors of a node n in T are the nodes on 
the unique path from root(T) to n excluding n itself. If node n in T is labeled 
(z, F), its k outgoing arcs are labeled p1,..., Pk, and %1,...,%m are the local 
variables of the ancestors of n, then FV(F) C {Z,£1,...,2m,£,pi,---,;Pp}. The 
set of the assignable variables at node n is Var(n) = z W {p1,..., pk}. The 
set of the rigid variables at node n is Rigid(n) = Z W z1 W... © Zm. Thus, 
FV(F) C Rigid(n)U Var(n), Rigid(root(T)) = Z, and the QSMA-subtree rooted 
at node n is Gn = (Rigid (n), Tn). For a node n with label (z, F), the components 
of the label are denoted n.% and n.F. The label of the arc from n to a child b is 
denoted b.p. 


Example 1. Given dax.((Vy1.Fi (2, y1]) > (Vya. Fo|z, y2])) from Sect. 1.1, let p= 
Jr. ((3y1 -=F (2, y1]) V (7 y2.7F2[2, y2])) = de.(p1 V-p2){pi — Iyı. =F; le, yil }i=1: 
The QSMA-tree for y has root r labeled a pı V 7p2) with left child bı labeled 
(y1; 7Fi[2, y1]), right child bz labeled (y2, 7 F)[x, y2]), and arcs from r to bı and 
from r to bg labeled pı and p2, respectively. Note how FV(r.F) C {x,p1, po}, 
Var(r) = {x,p1,p2}, and Rigid(r) = Ø. Also, FV(b,.F) C {2,y1}, FV (b2.F) C 
{x, y2}, Var(b1) = {yi}, Var(b2) = {y2}, and Rigid(b,) = Rigid(b2) = {x}. 


Example 2. Consider Vx.((Sy1.(a ~ 2-y1)) = (Ay2.(3-" ~ 2-y2))). A double 
negation eliminates the VY, yielding ~(3x.((3y1.(x ~ 2-y1)) A (Vy2.(3-@ £ 2-y2)))). 


ams 


~ 
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Again, a double negation eliminates the Y, producing =~(3x.((3y1.(x = 2-y1)) A 
(=(3y2.(3-£ = 2-y2))))). Let y = Ja.((Ayi-(@ = 2-y1)) A (a(Ay2.(3-@ = 2-y2)))) = 
dz.(p,A\7p2){p1 — 3y1.(x ~ 2-y1), pe — Ayo.(3-x ~ 2-yo)}. The original formula 
is true in LRA iff ọ is false in LRA. The QSMA-tree for y has root r labeled 
(a,p1 A —p2) with left child bı labeled (y1,2 ~ 2-y1), right child bə labeled 
(y2,3-e ~ 2-42), and arcs from r to bı and from r to bg labeled pı and pə, 
respectively. The variable sets of this tree are as in Example 1. 


—m 


— 
— 
~ 


Conversely, given a QSMA-tree G = (z, T), we can associate a formula n.w 
to any node n in T and hence to the QSMA-subtree Gn = (Rigid(n), Ta). 


Definition 2 (Formula at a node). Given a QSMA-tree G = (z,T), for all 
nodes n of T, the formula n.w at node n is defined inductively as follows: 


- If n is a leaf labeled (z, F|z,z]), then n.p = 3z. F |Z, z]; 

- If n has label (z, F|z,z,p]) and outgoing arcs labeled pı,..., Ppr, k > 0, con- 
necting n to children bı,..., bx, let by.w,..., bp. be the formulas at b1,..., bp. 
Then n.p = 3z.F[Z, z, ppi — bj} Ey. 


If G = (z,T) is the QSMA-tree for y and r = root(T), then r.y = y. 


Example 3. For the QSMA-tree in Example 2, b1. = Jyi.(x£ ~ 2-y1), be.) = 
Jy2.(3-x ~ 2-y2), and r.y = Jx. (pı Apa) {Pr — Jy. (x ~ 2-y1), pe — Ay2.(3-2 ~ 
2-yo)} = da.((Sy.(@ = 2-y1)) A a(Ay2-(3-@ ~ 2-y2))) = p. 


Since the input formula y is represented as a QSMA-tree G = (Z,T), the 
problem of satisfying y becomes the problem of satisfying G. Therefore, we define 
satisfaction of a QSMA-tree next. Slightly abusing the notation, we use = also 
for satisfaction of QSMA-trees. 


Definition 3 (Satisfaction of a QSMA-tree). Given a QSMA-tree G = (z, T) 
with r = root(T), and an extension M of Mo to Rigid(r) =z, MEG if there 
exists an extension M’ of M to Var(r) such that (i) M' = r.F, and (it) for all 
children b of r, M' (b.p) = true iff M' = Gy. 


The QSMA algorithm works by traversing the QSMA-tree G = (z, T), and at 
each node n in T it assigns the assignable variables in Var(n) = @W{p1,..., pr}. 
This assignment corresponds to the extension M’ in Definition 3. Let b be a 
child of n: the Boolean variable b.p labeling the arc from n to 6 is a proxy for 
the quantified subformula b.w of the formula n.p. If M’(b.p) = true, the aim of 
the algorithm is to show that b.w is true, and if M’(b.p) = false, the aim is to 
show that b.w is false. Therefore Condition (ii) in Definition 3 says M’ = Gy if 
M'(b.p) = true and M’ - Ga if M’ (b.p) = false. The next theorem shows that 
satisfying a formula y and satisfying the QSMA-tree for p correspond. 


Theorem 1. For all formulas y with FV(y) = Z, for all models M extending 
Mo to Z, if G is the QSMA-tree for y then MEG iff M E ọ. 


Checking whether M I G by testing all possible extensions M’ would not do, 
because for most theories (e.g., LRA) there is an infinite number of extensions. 
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We need a way to weed out large parts of the space of candidate models. Let 
[y] denote the set of y’s models. We introduce under-approximations and over- 
approximations of y in order to under-approximate and over-approximate [vy]. 


Definition 4 (Under- and over-approximation). Let p be a formula with 
FV(y) = Z. Quantifier-free formulas U and O with FV(U) = FV(O) = 2 are, 
respectively, an under-approximation and an over-approximation of y, if for all 


extensions M of Mo to z, M = U implies M | y and M E yọ implies M E O. 


It follows that [U] C Jy] € [O]. Let G = (z,T) be the QSMA-tree for y, 
and U and O under- and over-approximations of y, respectively. Then, M = U 
implies M = wy which implies M |= G by Theorem 1. Thus, satisfying an 
under-approximation is a sufficient condition to have a solution. On the other 
hand, M H —O implies M W » which implies M jÆ G by Theorem 1. By the 
contrapositive, if M H G then M j —O, that is, M = O. Thus, satisfying 
an over-approximation is a necessary condition to have a solution. In order to 
construct such approximations, we assume to have a solver for theory T (and 
model Mo) offering: 


— Model extension: A function SMA such that for all formulas 4z.F'[Z, z], where 
F[|z, z] is quantifier-free, and all extensions M of Mo to z, SMA(F|Z, z], M) 
returns either an extension M’ of M to g such that M’ = F|z, z], or nil if 
there is no such extension. 

— Model-based under-approximation: A function MBU such that for all formulas 

Jz.F|z, z], where F[Z,Z] is quantifier-free, and all extensions M of Mo to 
z such that M — 3z.F[|z, z], MBU(F|Z, Z],%,M) returns a quantifier-free 
formula U[zZ] such that M — U[zZ] and T H U[2] => (S%.F |Z, z]). 

— Model-based over-approximation: A function MBO such that for all formulas 

Jz.F|z, z], where F[Z,Z] is quantifier-free, and all extensions M of Mo to 

z such that M j 3z.F|z, z], MBO(F[Z, Z],%,M) returns a quantifier-free 


formula O[2Z] such that M j O[zZ] and 7 — (S%.F[Z, Z|) > O[Z]. 


— 


— 


MBU and MBO produce, respectively, an under-approximation and an over- 
approximation. Formula U[Z] is true in model M and implies 4z.F[Z,z], and 
hence can be seen as an interpolant between model and formula. It was called 
model generalization [12,17], because U[Z] may have other models in addition to 
M. Formula O[2] follows from 4%.F'|Z, z] and is false in M, and hence can be seen 
as a reverse interpolant between formula and model, called model interpolant [17]. 


4 The QSMA Algorithm and Its Total Correctness 


Let G = (Z,T) be the QSMA-tree for input formula y with FV(y) = Z. Given a 
model M extending Mo to Z, the QSMA algorithm determines whether M € G. 
Suppose that U and O are under- and over-approximations of y, respectively. 
Picture [U], [y], and [O] as bubbles. The [U] bubble is inside the [p] bubble, 
which is inside the [O] bubble. The idea of the algorithm is to zoom in on a 
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model of y, by progressively weakening U, so that the [U] bubble inflates, and 
progressively strengthening O, so that the [O] bubble deflates. The algorithm 
operates in this manner for all subformulas of ọ: for all nodes n of T it maintains 
under and over-approximations n.U and n.O of n.p, progressively weakening 
n.U and strengthening n.O. The weakening of n.U is done by introducing a 
disjunction with an MBU. The strengthening of n.O is done by introducing a 
conjunction with an MBO. The goal is that M satisfies n.U V —n.O. As soon as 
M satisfies n.U, we know that M | Gn. As soon as M satisfies —n.O, we know 
that M KG). 


@pre: G = (Z,T): QSMA-tree for y with FV(y) = Z; M: extension of Mo to Z 
@post: rv iff M |= G (rv is “returned value”) 

1: function QSMA(M, T) 

2 for all nodes n in T do 
3: nU HL 
4 
5 


n.O 4T 
return SUBTREEISSOLVED(root(T), M) 


Fig. 1. Pseudocode of the main function of the QSMA algorithm 


The main function QSMA (Fig. 1) initializes n.U to L (under-approximation 
of all formulas and identity for disjunction) and n.O to T (over-approximation 
of all formulas and identity for conjunction) for all nodes n of T. Then QSMA 
calls the function subtreeIsSolved (Fig. 2) with arguments root(T) and M. 

Function subtreeIsSolved takes a node n and a model M extending Mo to 
Rigid(n) and determines whether M H Gn. If M H n.U it returns true; if M H 
an.O it returns false (lines 3-5 in Fig. 2). Otherwise (i.e., M H 7n.U A n.O), it 
enters a loop whose body contains the following steps: 


1. Build a formula L as the conjunction of n.F and a formula for every child b 
of n, denoted n — b (line 7 in Fig. 2). The shape of the formula for b is better 
explained by considering a model of L and hence in the next step. 

2. Invoke the SMA function to search for an extension M’ of M to Var(n) such 
that M’ H} L (line 8). For all children b of n, b.p € Var(n) and M’ assigns a 
Boolean value to b.p. If M’(b.p) = true, the subformula for b in L reduces to 
b.O, so that M’ = L implies M’ | b.O. Since QSMA seeks to satisfy b.~ and 
[b.] C [b.O], it starts at least from a model of b.O. If M’(b.p) = false, the 
subformula for b in L reduces to 7b.U, so that M’ = L implies M’ = 70.U. 
Since QSMA seeks to falsify b. and [b.U] C [b.v], it starts at least from a 
model of =b.U. The proof of partial correctness! of subtreeIsSolved shows 
that the existence of an M’ such that M’ — L is necessary for M = Gy. 


1 See https: //mariapaola.github.io/CDSATandQSMA.html for a copy of this paper 
with the proofs inserted. 
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@pre: M: extension of Mo to Rigid(n), and I = Vb € T. [[b.U] € [b.W] € [[b.O] 
@post: I and M H (n.U V 7n.O) and (rv iff M H G,) and (rv iff M H 
n.U) and (-=rv if M — ~n.0O) 
1: function SUBTREEISSOLVED(n, M) 
if M = n.U then 
return true 
else if M | 77.0 then 
return false 
while true do 
L4 n.F A {\,,_,,((b-p => b.O) A (>b.p > 70.U)) 
M’ + SMA(L, M) 
if M’ = nil then 
n.O + n.O A MBO(L, FV (L) \ Rigid(n), M) 
return false 
else 
if SOLUTIONFORALLCHILDREN(n, M’) then 
L’! nF A (\\,,_,,((b-p = b.U) A (b.p > =.0)) 
n.U < n.U V MBU(L', FV(L’) \ Rigid(n), M) 
return true 


: function SOLUTIONFORALLCHILDREN(n, M) 
for all children b of n do 
if M(b.p) # SUBTREEISSOLVED(b, M) then 
return false 
22: return true 


H 


Oo O Sg 


No N 
e. © 


Fig. 2. Pseudocode of the auxiliary functions of the QSMA algorithm 


. If SMA returns nil, then M |Æ Gn; subtreeIsSolved updates n.O to its 
conjunction with MBO(L, FV (L) \ Rigid(n), M) (line 10). Since M j L, by 
MBO’s specification we know that M + MBO(L, FV (L) \ Rigid(n), M). This 
update ensures that M jÆ n.O, so that M = 7n.O. Then subtreeIsSolved 
returns false (line 11). 
. Otherwise, we have an extension M’ that satisfies L and hence n.F,, so that 
there is potential for M = Gn. Function solutionForall1Children is invoked 
to determine whether this is the case. 

. The function solutionForallChildren calls subtreeIsSolved for every 
child b of n. As soon as it finds a child b such that M(b.p) = true and 
the call subtreeIsSolved(b,M) returns false, or M(b.p) = false and the 
call subtreeIsSolved(b,M) returns true, it returns false, because it found 
a QSMA-subtree where candidate model M fails. If this does not happen, 
solutionForallChildren returns true. 

. If solutionForallChildren returns true, subtreeIsSolved builds a formula 
L’ as the conjunction of n.F and a formula for every child b of n (line 14). If 
M'(b.p) = true, the subformula for b in L’ reduces to b.U. If M’(b.p) = false, 
the subformula for b in L’ reduces to 7b.O. The proof of partial correctness 
of subtreeIsSolved shows that M’ = L’ and that M’ = L’ is a suff- 
cient condition for M H= Gn. Then subtreeIsSolved updates n.U to its 
disjunction with MBU(L’, FV(L’) \ Rigid(n), M) (line 15). Since M’ = I’, 
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by MBU’s specification we know that M’ = MBU(L’, FV(L’) \ Rigid(n), M). 
This update ensures that M’ |} n.U. Then subtreeIsSolved returns true 
(line 16). 

7. If solutionForallChildren returns false, the control returns to line 7. Sup- 
pose that solutionForallChildren returned false, because it found a child 
b of n such that M(b.p) = true and subtreeIsSolved(b,M) returned false. 
Then the call subtreeIsSolved(b,M) updated the formula b.O (line 10). Sup- 
pose that solutionForallChildren returned false, because it found a child 
b of n such that M(b.p) = false and subtreeIsSolved(b,M) returned true. 
Then the call subtreeIsSolved(b,M) updated the formula b.U (line 15). 
Either way the state has changed, variable L gets a new formula on line 7, 
and the subsequent call to SMA will not produce the same model. 


Example 4. Apply subtreeIsSolved to the root of the QSMA-tree in Example 1. 
Formula L gets pı V 7p2. SMA produces an M’ that assigns values to x, pı, and 
P2. Suppose that M’ satisfies pı V apo by assigning true to pı. In the recursive 
call on bı, formula L gets ~F; |z, y1]. If SMA produces an M” that extends M’ 
with an assignment to yı such that M” = 7F;[z, yi], we have a model. Suppose 
that M’ satisfies pı V ~p2 by assigning false to po. In the recursive call on bə, 
formula L gets 7F[x, y2]. If SMA fails to produces an M” that extends M’ with 
an assignment to y2 such that M” H 7F)[z, y2], we have a model. 


Theorem 2. The function subtreeIsSolved is partially correct: if the precon- 
ditions hold and the function halts, then the postconditions hold. 


For termination, we begin with the MBU and MBO functions. Let T be LRA 
with a theory extension LRAT that adds constant symbols ĝ for all rational num- 
bers q. Consider an MBU function such that MBU(F |z, z], x, M) = F|z, xs {x— g} 
and M — F[z, q]. This kind of MBU is called generalization-by-substitution [12]. 
While F[Z, q] is an under-approximation of 3x.F|z, x], this MBU is not a good 
choice for termination. By applying MBU repeatedly with an infinite enumeration 
of rational constants, the QSMA algorithm could build an infinite sequence of 
under-approximations (\/j_, F[Z, r {x— ĝi })nen none of which is LRA-equivalent 
to dx. F[Z, x]. The next definition excludes such MBU functions, by requiring that 
for a given formula and variable tuple (that depends on the formula), MBU can 
generate only finitely many formulas. 


Definition 5 (Finite basis). An MBU function has finite basis if the set 
{MBU(F|z, z], z, M) | M : extension of Mo to Z such that M — 3z.F[|z,z]} 
is finite for all quantifier-free formulas F|z, z] and tuples z. 


The notion of an MBO function having a finite basis is defined in the same 
way with jÆ in place of H. 
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Lemma 1. If MBU and MBO have finite basis, for all (possibly infinite) series 
of calls {subtreeIsSolved(n, M;)};, all satisfying the preconditions and all ter- 
minating, formulas n.U and n.O are updated only a finite number of times. 


Once nontermination due to MBU or MBO is excluded even for an infinite 
series of halting calls, termination is proved by induction on the QSMA-tree. 


Theorem 3. If the MBU and MBO functions have finite basis, whenever the 
preconditions are satisfied the function subtreeIsSolved halts. 


Example 5. Apply subtreeIsSolved to the root of the QSMA-tree in Example 2. 
Formula L gets pı \7p2. SMA produces an M’ that assigns values to x, pı, and 
p2. Suppose that M’ assigns 1 to x, while it must assign true to pı and false to 
p2. In the recursive call on b1, formula L gets x ~ 2-y,. If SMA produces an M” 
that extends M’ with ys, we have a model of Gp,. In the recursive call on 
b2, formula L gets 3-2 ~ 2-y2. If SMA produces an M” that extends M’ with 
y2—3, we have a model of G,,, but because M’(p2) = false, there is no model 


of G. Indeed, formula y of Example 2 is false as the original formula is true. 


5 The OptiQSMA Algorithm and Its Total Correctness 


YicesQS implements an optimized variant of QSMA, called OptiQSMA, that 
reduces the number of recursive calls to subtreeIsSolved by entrusting more 
work to each call to SMA. Reconsider the behavior of QSMA in Example 4. 
We can avoid a recursive call to subtreeIsSolved by asking SMA to satisfy 
(pı V mp2) A (pı => =F; fz, yı]) in lieu of pı V apo. This way, if the candidate 
model returned by SMA assigns true to pı, it also assigns to x and y values 
that satisfy ~F; |z, yı]. This means that Jyı.—F; |x, yı] is found true without 
recursion. On the other hand, if pə is assigned false, the algorithm still has to 
make the recursive call to see if it can satisfy Sy2.7F>[x, y2]. 

The idea of OptiQSMA is to do a look-ahead on a path in the QSMA-tree, 
doing the work in one shot rather then through recursive calls on all the nodes 
in the path. The look-ahead applies to a path such that the Boolean labels of 
all the arcs in the path are assigned true by the candidate model. The following 
definition builds a formula to allow the look-ahead. 


Definition 6 (Look-ahead formula). Given a QSMA-tree G = (Z,T), for all 
nodes n of T the look-ahead formula of n is LF(n) = n.F A(\,,,_,,(b-p > LF (b)). 


The next definition distinguishes the nodes that are handled together in one 
shot without recursion and those where recursion is still needed. Nodes of the 
first kind are called no alternation nodes, because such nodes are on a path as 
described above, where all Boolean labels are assigned true and hence there is 
no alternation between true and false. Nodes of the second kind are called first 
alternation nodes, because they are the nodes reached by the first arc whose 
Boolean label is assigned false. 
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@pre: G = (z,T): QSMA-tree for y with FV (p) = Z; M: extension of Mo to Z 
@post: rv iff M Hua G 
1: function OptiQSMA(M, T) 
2: for all nodes n in T do 
nU HL 
ans + OPTISUBTREEISSOLVED(root(T), M) 
if ans = SAT(_) then 
return true 
else if ans = UNSAT(_) then 
return false 


OAD ob co 


Fig. 3. Pseudocode of the main function of the OptiQSMA algorithm 


Definition 7 (No alternation nodes and first alternation nodes). Given 
a QSMA-tree G = (Z,T) for all nodes n of T and extensions M of Mo to 
FV(LF(n)), the set NAN(n, M) of the no-alternation nodes from n according 
to M (resp. the set FAN(n, M) of the first-alternation nodes from n according to 
M) contains all and only the nodes b such that: (i) b is a descendant of n through 
a path n > nı > ... > Ng > b (q > 0), (ti) Vi, 1 < i < q, M(ni.p) = true, and 
(iii) M(b.p) = true (resp. M(b.p) = false). 


A node b € FAN(n, M) such that q = 0 in Condition (i) of Definition 7 
is a child of n: for a child there is no optimization. The OptiQSMA algorithm 
seeks a candidate model M that satisfies LF (n) and recurses only on the nodes 
in FAN(n, M). Therefore, the definition of satisfaction with look-ahead, denoted 
Hza, follows the pattern of Definition 3, replacing r.F with LF (r) and Condi- 
tion (ii) of Definition 3 with a condition for the nodes in the FAN set. 


Definition 8 (Satisfaction with look-ahead). Given a QSMA-tree G = 
(Z,T) with r = root(T) and an extension M of Mo to Rigid(r) = z, M Hia G 
if there exists an extension M’ of M to FV(LF(r)) such that (i) M' = LF (r) 
and (ii) for all nodes b € FAN(r, M’), M’ ia Go. 


Since for the nodes b € FAN(r, M’) it is M’ (b.p) = false, the Ha relation is 
negated in Condition (ii). The next theorem shows that the optimization does 
not change the problem. 


Theorem 4. Given a QSMA-tree G = (Z,T) and an extension M of Mo to Z, 
MEG if and only if M Hia G. 


The OptiQSMA algorithm maintains under-approximations n.U of n.) for all 
nodes n, but not over-approximations. Accordingly, the main function OptiQSMA 
(Fig. 3) initializes only n.U for all nodes n, and then calls optiSubtreeIsSolved 
(Fig. 4). This function returns SAT(U) if M Ha G and UNSAT(O) if M ia 
G. The formula U is an under-approximation of r.p (r = root(T)) such that 
M H U. The formula O is an over-approximation of r.y such that M Æ O. The 
main function OptiQSMA has no usage for U and O and merely returns true 
or false accordingly. Function optisubtreeIsSolved builds and returns under- 
approximations and over-approximations recursively. The reason for saving only 
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@pre: M is an extension of Mo to Rigid(n), and I = Vb € T. [b.U] C [b4] 
@post: I and 
{rv = UNSAT(O) implies [(Vb € T. [b.y] € [O]) and M  O]} and 
{rv = SAT(U) implies [(Vb € T. [b.U] © [b.w]) and M H UJ} 
1: function OPTISUBTREEISSOLVED(n, M) 


2: while true do 

3: L + LF(n) A Nns (b-p > 70.0) 

4: M’ + SMA(L, M) 

5: if M’ = nil then 

6: return UNSAT(MBO(L, FV (L) \ Rigid(n), M)) 
T: else 

8: reasons + T 

9: if SOLUTIONFORALLDESCENDANTS(n, M’, reasons) then 
10: L’ — LF(n) A^ reasons 
11: return SAT(MBU(L’, FV(L’) \ Rigid(n), M)) 
12: 


13: function SOLUTIONFORALLDESCENDANTS(n, M, reasons) 
14: for all b € FAN(n, M) do 


15: ans + OPTISUBTREEISSOLVED(b, M) 
16: if ans = SAT(U) then 

17: bU + b.U VU 

18: return false 

19: else if ans = UNSAT (O) then 

20: reasons < reasons ^ (=b.p = =O) 
21: for all b € NAN(n, M) do 

22: reasons < reasons ^ b.p 

23: return true 


Fig. 4. Pseudocode of the auxiliary functions of the optiQSMA algorithm 


under-approximations is practical, and will become clear after the illustration of 
optisubtreeIsSolved. This function takes a node n and a model M extending 


Mo to Rigid(n) and determines whether M 
body contains the following steps: 


Hza Gn, by executing a loop whose 


1. Build a formula L (line 3 in Fig.4) as the conjunction of the look-ahead 


formula LF(n) (in lieu of n.F in line 7 of Fig.2) and a formula for every 
descendant b of n, denoted n —* b (in lieu of child as in Fig. 2). 

. Invoke the SMA function to search for an extension M’ of M to Var(n) 
such that M’ — L. For those descendants b for which M’(b.p) = false, the 
subformula for b in L reduces to —=b.U as in Step 2 of the description of 
subtreeIsSolved. For those descendants b for which M’(b.p) = true, the 
subformula for b in L reduces to true, in agreement with the fact that over- 
approximations are not kept. 

. If SMA returns nil, optiSubtreeIsSolved returns UNSAT(O), where O is 
simply the outcome of applying MBO to L and M, as over-approximations 
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are not kept. Otherwise, there is potential for satisfaction with look-ahead. 
Function optiSubtreeIsSolved initializes the formula reasons to T and 
invokes solutionForallDescendants passing reasons by reference. 

4. Function solutionForallDescendants considers first all descendants b in 

FAN(n, M), and calls optiSubtreeIsSolved(b, M) for each of them. If this 
call returns SAT (U), it means that M |a Gp; solutionForallDescendants 
weakens b.U by disjunction with U and returns false. 
If optiSubtreeIsSolved(b, M) returns UNSAT (O), it means that M ta Go, 
and we move on to the next descendant in FAN (n, M). Prior to that, reasons 
is strengthened by conjunction with —b.p = —=O. For all descendants b in 
NAN(n, M), solutionForallDescendants strengthens reasons by conjunc- 
tion with b.p. 

5. If solutionForallDescendants returns true, optiSubtreeIsSolved builds 
formula L’ as LF (n) ^ reasons, and returns SAT(U), where U is the outcome 
of the application of MBU to L’ and M. Otherwise, the control returns to 
line 3. Since solutionForallDescendants returned false, it means that it 
found a node b in FAN(n, M) for which optiSubtreeIsSolved(b,M) returned 
SAT(U) and the formula b.U was updated (line 17). Therefore the state has 
changed, variable L gets a new formula on line 3, and the subsequent call to 
SMA will not produce the same model. 


In the experiments it turned out that storing over-approximations for all 
nodes is less efficient than using them to compute L’ and then forget them. 
Thus, the over-approximation O encapsulated in the UNSAT(O) value returned 
by a recursive call to optiSubtreeIsSolved is used to build the temporary 
formula reasons, but it is not saved, and reasons is used to compute L’. 


Theorem 5. The function optiSubtreeIsSolved is partially correct: if the pre- 
conditions hold and the function halts, then the postconditions hold. 


The proof of partial correctness of optiSubtreeIsSolved shows that every 
model that satisfies L’ = (LF (n) A reasons) fulfills Definition 8. In this sense, 
reasons is an explanation of why a model is found with look-ahead. 


Theorem 6. If the MBU and MBO functions have finite basis, whenever the 
preconditions are satisfied the function optiSubtreeIsSolved halts. 


6 The YicesQS Solver and Experimental Results 


The OptiQSMA algorithm is implemented in YicesQS to equip Yices 2 with sup- 
port for quantifiers for complete theories (unrelated to Yices 2 support for quan- 
tifiers in UF).? MBO is available as model interpolation from Yices’s MCSAT [10] 
solver for quantifier-free formulas, including theory-specific techniques for bitvec- 
tors (BV) [15] and arithmetic. The latter are based on NLSAT [16] and ulti- 
mately on Cylindrical Algebraic Decomposition (CAD). Basic MBU is done 


? See https://github.com/disteph/yicesQS and https://yices.csl.sri.com/. 
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BV. on 
CVC5 854/970 25,584s 


Q3B 835/970 13,5108 J 
Z3 775/970 7,7125 ” 
Bitwuzla 759/970 15,5728” 


Q3B-pBDD 754/970 15,553s 
YicesQS 708/970 3,862s _ 
Ultim.Elim. 304/970 4,204s 


eee YicesQS 9 —_=<=-Z3 CVCS «ees UltimateElim «= Bitwuzla —__: QB —_=Q3B-pBDD 


Fig. 5. Plot for BV. 


as generalization-by-substitution [12] and improved with model-based projection 
(e.g., [18]) for arithmetic, and invertibility conditions [21], including e-terms, for 
BV. In YicesQS model-based projection also is based on CAD. 

In the 2022 SMT competition, YicesQS entered the single-query, non-incre- 
mental tracks of BV, LRA, LIA, NRA, and NIA (nonlinear integer arithmetic). The 
experiments were run on the Starexec cluster with a 20 min timeout per bench- 
mark and 60GB of memory. The benchmarks were a subset of the SMT-LIB 
collection. The results presented below were computed by running the compe- 
tition script join.sh on the raw data from StarExec,* sorting the data, and 
producing the plots that are available online. A description of the participating 
solvers can be found on the competition website.” 

Figure 5 shows the results for BV, where YicesQS solved quickly a high num- 
ber of benchmarks (compared for example with CVC5), but was not outstanding, 
possibly because YicesQS 2022 makes a limited use of invertibility conditions 
for model interpolation. Figure 6 shows the results for the four arithmetics. The 
columns on the left list number of solved instances and time to solve them for 
each logic and solver. In the plot on the right, each color corresponds to a solver 
and point (x,y) of that color means that the zt” fastest-solved benchmark was 
solved by that solver in time y (log scale). 2021 Z3 is included because in some of 
these logics it performed slightly better than 2022 Z3. The logic where YicesQS 
performed best is LRA: it was the only solver to solve all 1,003 benchmarks. Z3 
2021 was second best, solving 948 benchmarks with a total runtime about 100 
times higher. YicesQS has neither a special treatment (e.g., simplex-based) of lin- 
ear problems, nor integer-specific techniques: it relies on CAD-based techniques 
for MBU and MBO also for integer problems. Thus, it is somewhat average 
on LIA and NIA. These two theories are undecidable (NRA due to division by 
0) and hence they lie outside of the theoretical framework of QSMA. YicesQS 


3 https: //github.com/SMT-COMP /smt-comp/tree/master /2022/results. 
t http: //www.csl.sri.com/users/sgl/Work /Cade2023-data/index.html. 
5 https: //smt-comp.github.io/2022/participants.html. 
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LRA. sto 
YicesQS 1003/1003 414s 
Z3 2021 948/1003 41,0688 ™ 
Z3 936/1003 41,240s 
Ultim.Elim. 847/1003 16,136s 
CVC5 834/1003 21,197s 
Vampire 484/1003 45,326s 


SMTInterpol 164/1003 2,584s 


NRA. 

YicesQS 94/99 165s 
Z3 2021 94/99 315s 
Z3 90/99 294s 
CVC5 86/99 672s 


Vampire 83/99 73s 
Ultim.Elim. 6/99 33s 


LIA. 
Z3 300/300 lls 
CVC5 300/300 78s 


Z3 2021 292/300 10s 
Ultim.Elim. 230/300 11,789s 
YicesQS 182/300 750s 
Vampire 157/300 985s 
SMTInterpol 97/300 134s 


VeriT 75/300 1s 

NIA. 

CVC5 190/208 3,642s 

Ultim.Elim. 129/208 701s 
23 88/208 317s { 


Z3 2021 87/208 53s 
YicesQS 80/208 290s 
Vampire 66/208 13,744s 


ee YicesQS ee Z3 me CVC5 ee UitimateElim = ——<=—Vampire ———=Z3-2021 —SMTinterpol —veriT 
Fig. 6. Plots for the four arithmetics. 
answers should still be correct, but termination can be lost. With Z3 being a 


non-competing participant in the SMT 2022 competition, YicesQS came second 
for Largest Contribution (single queries), because of its overall performance in 
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the four arithmetics, where it also came first for satisfiable instances and in the 
24 sec timeout setup (instead of 20 min). 


7 Discussion: Related Work and Future Work 


Quantified SMT was approached by a procedure with an 4-solver and a V-solver 
for prenex normal form formulas with JV prefix [12]. A formulation as a game 
between an J-player and a V-player appeared with the QSAT algorithm [3] for 
prenex normal form formulas with (AV)* prefix. QSMA accepts arbitrary formu- 
las with quantifiers in arbitrary positions. 

Both QSAT and QSMA work for a generic theory T over basic T -specific com- 
ponents. QSAT uses model-based projection [3,18] and a solver for quantifier-free 
satisfiability that supports UNSAT cores. Model-based projection is an instance 
of MBU. An UNSAT core (as a conjunction) is an MBO in the special case 
where the input assignment is Boolean. While MBO can produce UNSAT cores, 
MBO generalizes the concept of UNSAT core with theory-specific reasoning when 
there are non-Boolean input assignments, as it is the case in QSMA. It is unclear 
whether the combination of UNSAT cores and theory-specific MBU can emulate 
MBO or provide the same benefits. QSAT is implemented in Z3 and it is the 
default solver for LIA, LRA, and NRA. 

YicesQS is a recent implementation that only participated in the SMT com- 
petition in 2021 and 2022. Directions for further development include augmenting 
integer reasoning, and improving model interpolation in BV by a better usage of 
invertibility conditions. Another lead for future work is to compose QSMA within 
the CDSAT framework for conflict-driven reasoning in unions of theories |4—6]. 
For this, one may need to drop the assumption that there is a unique model 
Mo and only its extensions need to be considered, which will be a generalization 
also in the single theory case. As most known MBU and MBO functions are for 
single theories, one may have to study how to get MBU and MBO functions 
for a union of theories from such functions for the component theories. Another 
issue is the interplay between QSMA’s recursive descent over the QSMA-tree for 
the formula and CDSAT’s conflict-driven search. 
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Abstract. This paper introduces a uniform substitution calculus for 
dLcup, the dynamic logic of communicating hybrid programs. Uniform 
substitution enables parsimonious prover kernels by using axioms instead 
of axiom schemata. Instantiations can be recovered from a single proof 
rule responsible for soundness-critical instantiation checks rather than 
being spread across axiom schemata in side conditions. Even though 
communication and parallelism reasoning are notorious for necessitating 
subtle soundness-critical side conditions, uniform substitution when gen- 
eralized to dLcup manages to limit and isolate their conceptual overhead. 
Since uniform substitution has proven to simplify the implementation 
of hybrid systems provers substantially, uniform substitution for dLcup 
paves the way for a parsimonious implementation of theorem provers for 
hybrid systems with communication and parallelism. 


Keywords: Uniform substitution - Parallel programs - Differential 
dynamic logic - Assumption-commitment reasoning - CSP 


1 Introduction 

Hybrid systems and parallel systems are notoriously See. EI (xx) 
subtle to analyze. Combining both not only cul- la I| AI A y) 
minates these subtleties but is further complicated 
because parallel hybrid systems are interlocked by 
synchronization in a shared global time. The dynamic 
logic of communicating hybrid programs dLcup [6] 
tames the complexity of parallel hybrid systems providing a compositional proof 
calculus that disentangles reasoning into purely discrete, continuous, and com- 
munication pieces. However, the calculus is subject to schematic side conditions 
whose implementation is generally error-prone causing large soundness-critical 
code bases [30]. In particular, compositional reasoning about parallelism as in 
the idealized proof rule in Fig. 1 holds the challenge to exhaustively characterize 
all side conditions required to make all instances of this proof rule sound. Proof 
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Fig. 1. The proof rule is 
only sound under subtle 
side conditions (xx). 
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systems for discrete parallelism [1,19,27,35,44,46] already have complicated side 
conditions, but complexity only increases with continuous interactions in shared 
global time. 

In order to compositionally support compositional reasoning for parallel 
hybrid systems, this paper generalizes Church’s uniform substitution [8] and 
develops a uniform substitution calculus [30-32] for dLcyp. Uniform substitu- 
tion modularizes the calculus itself enabling its parsimonious implementation. 
Although applicable to discrete parallelism, the dLcyp development resolves the 
inherent challenge that parallel hybrid systems always synchronize in time. 

Uniform substitution adopts a finite list of concrete formulas as axioms 
instead of an infinite set of formulas via axiom schemata with side conditions. 
This enables theorem provers without the extensive algorithmic checks otherwise 
required for each schema to sort out unsound instances. Thanks to the proof 
rule US for uniform substitution, only sound instances derive from the axioms 
such that the parallel composition rule in dLcyp could be adopted almost lit- 
erally as above, but with all the soundness-critical checking encapsulated solely 
in rule US. Thanks to US’s checking, parallel systems reasoning even reduces 
to a single parallel injection axiom [a] — [a || SB] that merely describes the 
preservation of property Y% of one parallel component a in the parallel system 
a || 8. Proofs about a || 8 reduce to a sequence of property embeddings with this 
axiom from local abstractions of the subcomponents, which combine soundly due 
to US. 

Soundness checks in uniform substitution are ultimately determined by the 
binding structures as identified in the static semantics. The development of uni- 
form substitution for dLcyp is, therefore, grounded in the following key obser- 
vation: Communication and parallelism both cause additional binding structure 
that needs attention in the substitution process performed by rule US: 


(B 1) Expressions depend on communication along (co)finite channel sets 
(besides finitely many free variables), which, by the core substitution principle 
[8], must not be introduced free into contexts where they are written. 


(B II) Subprograms in a parallel context need to be restricted in the variables 
and channels written as compositional proof rules for parallelism require local 
abstractions of subprograms not depending on the internals of the context [35]. 


Grounded in the need for abstraction (B II), [a]y — [a || G]y can only be 
adopted as a sound axiom schema if a and 8 do not share state, and if program (3 
does not interfere with the contract 1, i.e., (i) Y has no free variables bound by 8 
(with exceptions), and (ii) Y does not depend on communication channels written 
by 8 (except for channels joint with a). This extensive side condition would need 
nontrivial soundness-critical implementations of dLcyp axiom schemata. Still, 
uniform substitution can be lifted with only small changes locally checking for 
clashes with written channels, and prohibited variables or channels. 

The modularity of uniform substitution is the key to the parsimonious imple- 
mentation [23] of the theorem prover KeYmaera X [11] for differential dynamic 
logic dL and differential game logic dGL [29], thus paving the way for a straight- 
forward theorem prover implementation of dLcyp. Since dLoyp conservatively 
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generalizes dL [6], its uniform substitution calculus inherits the complete [33] 
axiomatic treatment of differential equation invariants [30]. All proofs are in [7]. 


2 Dynamic Logic of Communicating Hybrid Programs 


This section briefly recaps dLcyp [6], the dynamic logic of communicating hybrid 
programs (CHPs). It combines hybrid programs [28] with CSP-style communica- 
tion and parallelism [15]. By assumption-commitment (ac) reasoning [22,46,47], 
dLcoyp allows compositional verification of parallelism in dL. For uniform substi- 
tution, function and predicate symbols, and program constants are added. 


2.1 Syntax 


The set of variables V = Vg U VNU Vz has real (Vg), integer (Vy), and trace (Vr) 
variables. For each x € Vg, the differential symbol x’ is in Vg, too. The designated 
variable u € Vg represents the shared global time. The set of channel names 
is 2. By convention x,y € Va, n € Vy, h € Vz, ch E€ Q, and z € V. Channel set 
Y C RN is (co)finite. Vectorial expressions are denoted é@. Moreover, fM, gM are 
M-valued function symbols and p,q,r are predicate symbols, where argument 
sorts are annotated by _: Mı, ...,Mp. Finally, a,b are program constants. 


Definition 1 (Terms). Terms consist of real (Trmp), integer (Trmy), channel 
(Trmg), and trace (Trmr) terms, and are defined by the grammar below, where 
0, 01,02 € Q|[Vr] C Trme are polynomials in Vg: 


Trmg: m,m = 2 | f?(Y,8) | m +n |m -n2 | (0)' | val(te) | time(te) 
Trmy: iez, iez ::= n | f(Y,ē) | ier + tes | |tel 
Trmg: ce, ces := f? (Y,ē) | chan(te) 


Trmy: te, tez ::= h | f7 (Y,2@) | (ch, 01,02) | ter - tez | te LY | teļie] 


Real terms are polynomials in Vg enriched with function symbols f®(Y, é) 
(including constants c € Q) only depending on communication along channels Y 
and terms @, differential terms (0)’, and val(te) and time(te), which access 
the value and the timestamp of the last communication in te, respectively. By 
convention, 0 € Q[Vr] denotes a pure polynomial in Vg without (-)’, val(-), and 
time(-) as they occur in programs. For simplicity, we do not define Q[Vg] C Trmg 
as a fifth term sort but use the convention that function symbols g? can only 
be replaced with Q[Vg|-terms. Integer terms are variables n, function symbols 
fN(Y,@) (including constants 0, 1), addition, and length |te| of trace term te.! 
The function symbol f%(Y, é@) includes constants ch € 2, and chan(te) is chan- 
nel access. Trace terms record the communication history of programs. They 
encompass variables h, function symbols f7 (Y, é@) (including the empty trace €), 
communication items (ch, 6,02) with value 6; and timestamp 02, projection 


' Omitting multiplication results in decidable Presburger arithmetic [34]. 
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te | Y onto channels Y, and access telie] of the ie-th item in te. Where useful, 
op(é) denotes built-in function symbols of fixed interpretation, e.g., + + +. 

dLcoyp’s context-sensitive program and formula syntax presumes notions of 
free and bound variables (Sect. 2.3) defined on the context-free syntax: 


Definition 2 (Programs). Communicating hybrid programs are defined by the 
following grammar, where 6 € Q[Va] is a polynomial in Ve and x € FOLR is a 
formula of first-order real-arithmetic. In a || B, the subprograms must not share 
state but can share time and history, i.e., BV(a@) O BV(3) C {u, p} U V7.? 


a, p ::= alY, Z) | x :=0 | x := * | ?x | {2 =0&x}|aBlaUB|a*| 
ch(h)!0 | ch(h)?z | a || 8 


The program constant a(Y, Z) restricts the written channels to Y C 92 and 
the bound variables to Z C Ve U Vz, where Y and Z are (co)finite. Instead of 
a(Y,Z), write a if Y and Z can be arbitrary. Assignment x := 0 updates = to 0, 
nondeterministic assignment x := * assigns an arbitrary real value to x, and 
the test ?x does nothing if x holds and aborts the computation otherwise. The 
continuous evolution {x’ = 0&} follows the ODE x’ = 0 for any duration as long 
as formula x is not violated. The global time u evolves with every continuous 
evolution according to ODE yp’ = 1. Sequential composition a; 8 executes 8 
after a, choice a U 8 executes a or B nondeterministically, a* repeats œ zero 
or more times, ch(h)!@ sends @ along channel ch, and ch(h)?ax receives a value 
into variable x along channel ch. The trace variable h records communication. 
Finally, a || @ executes a and ( in parallel synchronized in global time p. 


Example 3. The program ct* || ve* models a simplified cruise control [24]. The 
vehicle ve repeatedly receives a target velocity vt} from the controller ct along 
channel tar. The target vt! sent by ct is in range [0, V]. Hence, ve’s velocity Vye 
stays in range [0, V] within the e > 0 time units till the next communication if 
Uye € [0, V] held initially. The evolution {t’ = 1} allows passage of time in ct. 


ct = ult := x; ?(0 < ut < V); tar(h) ut; {t = 1} 


r 
Uye — Uve 


ve = tar(h)?uti; ave := 2—5: to := p; {ule = ave & u — to < €} 
€ 


Definition 4 (Formulas). Formulas are defined by the grammar below for 
relations ~, terms e1,e2 E€ Trm of equal sort, and z E€ V. Moreover, the ac- 
formulas are unaffected by state change in a, i.e., (PV(A)UFV(C)) ABV (a) C Vz. 


p, Y, A, C == e1 ~ e2 | pY, e) | np | p AY | Vz ¢ | [ald | lata, Y 


The formulas combine first-order dynamic logic with ac-reasoning. Predi- 
cate symbols p(Y, €) depend on channels Y and terms ë. The ac-box [a] ¢a,c}W 


? Previous work [6] disallows reading of variables bound in parallel as their change is 
not observable. This restriction is conceptually desirable but not soundness-critical. 
Here we drop it for simplicity, but it could be maintained by US as well. 
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expresses that C holds after each communication event and w in the final state, 
for all runs of a whose incoming communication satisfies A. Other connectives V, 
—, © and quantifiers dz y = =Vz 7 can be derived. The relations ~ include = 
for all term sorts, > on real and integer terms, and prefixing < on trace terms. 

By convention, the predicate symbol qr can only be replaced with formulas 
of first-order real arithmetic. It serves as placeholder for tests y in CHPs. 


Example 5. The cruise control from Example 3 is safe if its velocity stays in 
range [0, V]. This can be expressed with the formula y — [ct* || ve*]safe, where 
Wsate = 0 < Vye < V and Y = Vesate NE > OAV > 0. 


2.2 Semantics 


A trace T = (T1, ..., Tk) is a finite chronological sequence of communication events 
Ti = (chi, di, Si), where ch; € 2, and d;i E€ R is the communicated value, and 
si € R is a timestamp such that s; < sj for 1 <i < j < k. A recorded trace 
T = (T1,..., Tk) additionally carries a trace variable h; € Vy with each event, i.e., 
Ti = (hi, chi, di, si). For variable z € Vm and M € {R,N, 7}, let type(z) = M. A 
state v maps each z € V to a value v(z) € type(z). The sets of traces, recorded 
traces, and states are denoted 7, Tiec, and S, respectively. 

For d € type(z), the state v? is the modification of v at z to d. For T € Tree, 
the trace T(h) € T is obtained from the subsequence of r carrying h € Vz by 
removing the carried variable. State-trace concatenation vT € S for T E€ Tiec, 
appends r(h) to v at h for all h € Vr. The projection T | Y of (recorded) trace T 
is the subsequence of all communication events in r whose channel is in Y C 92. 
The state projection v | Y € S modifies v at h to v(h) | Y for all h € Vz. 

An interpretation I assigns a function I(f™ : M,,...,Mx) : xi M; — M to 
each function symbol f™ that is smooth in all real-valued arguments if M = R, 
and a relation I(p: Mı,..., Mk) C xi M; to each k-ary predicate symbol p. 


Definition 6 (Term Semantics). The valuation Iv[e] €E RUNU QUT of 
term e in interpretation I and state v is defined as follows: 


v 
= I(f\(Iùļei], .-., Iõ]ek]) wheres =v} Y 
= op(Ivļe1ı],...,Zvļe]) forbuiltin op € {-+-,-| Y,...} 


Tul f(Y, e1, ..., €k 
Iv[op(ei,..., ex 


The projection ¢ = v | Y ensures that f(Y,é@) only depends on Y, i.e., the 
communication in v along channels Y® does not matter. The differentials (6) 
have a semantics describing the local rate of change of 0 [30]. 

The denotational semantics of CHPs [6] combines dL’s Kripke semantics [30] 
with a linear history semantics [47] and a global notion of time. Denotations 
are subsets of D = S x Tree X Sı with Sı = SU{L}. Final state L marks an 
unfinished computation, i.e., it still can be continued or was aborted due to a 
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failing test. If (w’ = L and 7’ < T), where < is the prefix relation on traces, 
or (7’,w’) = (7,w), then (7’,w’) is a prefix of (7,w) written (7’,w’) < (7, w). 
Since (even empty) communication of unfinished computations is still observable, 
denotations D C D of CHPs are prefix-closed and total, i.e., (v, Tr, w) € D and 
(T', w’) < (7, w) implies (v, 7’, w’) € D, and Lp C D with Lp = S x {e} x {1}. 
Moreover, all (v,7,w) € D are chronological, i.e., v(u) < w(u) and when 7 = 
(T1,---,Tr) Æ € and let 7; (4) = ((hi, chi, di, si)) (u) = si, then v(u) < Tı(u) and 
if w A L, then T(u) < w(u). Note that 7 is chronological as all traces are. 

The interpretation I(a(Y, Z)) C D of a program constant a(Y, Z) is a prefix- 
closed and total set of chronological computations that (i) only communicate 
along (write) channels Y and (ii) only bind variables Z. More precisely, for all 
(v, T, w) € I(a(Y, Z)), we have (i) r | Y? = e, and (ii) v = w on Vy and w-t =v 
on 2°. For D, M C D, we define D, = {(v,7, L) | (v,7,w) € D}, and (v,7,w) € 
DeM if (v,7,u) € D and (u, 7, w) E M exist with T = 71-72. For states wa, Wg, 
the merged state wa ® wg is L if one of the substates wa or wg is L. Otherwise, 
Wa D Wg = We on BV(a) and wa © wg = wg on BV(a)® (or, equivalently by 
syntactic well-formedness, on Bv()3)° and Bv(@), respectively). If Y is the set of 
all channel names occurring in a, we write T | a for 7 | Y. 


Definition 7 (Program semantics). Given an interpretation I, the semantics 
a] ED of a CHP a is defined as follows, where Lp = S x {e} x {L} and F 
enotes the satisfaction relation (Definition 8): 


al, 20] = 1(alY, 2D) 

x := 0] = Lp U{(v,6,w) | w = v? where d = Iv[6]} 

z := *] = Lp U{(v,6,w) | w = v? where d € R} 

2x] = Lo U {(v,6, v) | Iv E x} 

{x' = 0 & x}] = Lp U { (v, €, p(s)) | v = p(0) on {u',2'}", and pC) = (0) 
on {a, x, u, p'}°, and Ip(¢) Fp’ =1Aa' =0Ax for all ¢ € [0,s] and 


a solution ọ : [0,8] = S with v(C)(2’) = POO) (c for z € {x, p}} 

ch(h)!0] = {(v,7, w) | (7, w) < ((A, ch, d, v(u)), v) where d = Iv[6]} 

ch(h)?2] = {(v,7,w) | (T, w) < ((h, ch, d, v(u)), v2) where d € R} 

aU 6] = Ifa] U ZIA] 

a; B] = fo] ê 216] = Gol) i U Ufo] > EN) 

a*] = U (Ja])” = U Ija”] wherea? = ?T anda"! = a; a” 
nEN nen 


SON NNN aN 


SBS RH HH ON 


~= 


(v, T | @j, Wa) € Ilaj] for j = 1,2, and 
ay | a] = (v, T, Wa, © Waz) 1 
Wa, = Wa, ON {u, H }, and T= T | (ailla) 


The semantics is indeed constructed prefix-closed, total, and chronological. 
Communication 7 of a || ag is implicitly characterized via its subsequences for 


102 M. Brieger et al. 


the subprograms. By 7 = 7T | (a1 || a2), there is no non-causal communication. 
Joint communication and the whole computation are synchronized in global 
time by the projections and by wa, = Wa, on {u, p’ }, respectively. Likewise, by 
projection, communication is synchronously recorded by trace variables. 


Definition 8 (Formula semantics). The satisfaction Iv E ¢ of a dLcyp for- 
mula @ in interpretation I and state v is inductively defined as follows: 


1. Iv E€ e~ez if Iv[ei] ~ Iv[e2] where ~ is any relation symbol 

2. IvE p(Y,e1,...,ex) if (Wofei],..., Z0]ex]J) E€ I(p) whered=vlY 

3. IvE pA ifIvey and Iv E% 

4. IvF 7y if IvF ọ, i.e., it is not the case that Iv E p 

5. Ive Yzo if Iv? E ọ for all d € type(z) 

6. IvF lal if Iw-7 F% for all (v,T,w) € Ifa] with w # L 

7. Iv E [alya.cy¥ if for all (v,7,w) € Ila] the following conditions hold: 
{Iv-7'| 7! <T}EA implies Iu- TFC (commit) 
({Iu-7' |T <T}EA and w# L) implies Iw- T Ey% (post) 


Where U F ọ for a set of interpretation-state pairs U and any formula ọ if 
IvE y for all Iv E€ U. In particular, 0F y. 


In item 6 and 7, reachable worlds are built from states v and w, and com- 
munication T, as change of state and communication are observable. The strict 
prefix < for the assumption in case (commit) in item 6 excludes (when A = C) 
the circularity that commitment C can be shown in states where it is assumed. 


2.3 Static Semantics 


In the uniform substitution process, checks of free and bound variables, as well 
as accessed and written channels, separate sound from unsound axiom instantia- 
tions. As parallelism requires fine-grained control over channels, the static seman- 
tics for dL [30] is lifted to a communication-aware static semantics for dLcoyp. 
It uses accessed channels to characterize the subsequence of a communication 
trace influencing truth of a formula even more precisely than free variables. 

To precisely grasp free and bound variables, and accessed and written chan- 
nels, Definition 9 gives a semantic characterization. In this section, formulas are 
considered truth-valued, i.e., Iu] ¢] = tt if Iv F ¢ and Iv|¢] = ff if Iv F ¢. 


Definition 9 (Static semantics). For term or formula e, and program a, 
free variables FV (e) and F (a), bound variables BV(a), accessed channels N (e), 
and written channels CN(a) form the static semantics. 


FV(e) = {z € V | Al,v,0 such that v = ù on {a}! and Ive] # Ive] } 

N(e) = {ch € R | II, v, ù such that v | {ch}? =o | {ch}? and Ive] # Iòle]} 

W (a) = {z€ V | Al,v,0,7,w such that v = ò on {z}6 and (v,T,w) € Ifa], 
and there is no (0,7,W) € Ila] such that 7 =7 and w = ù on {z}®} 

BV(a) = {z € V | AT, (v,7, w) € Ia] such that w 4 L and (w-7)(z) F v(z)} 

N(a) = {che 2 | AI, (v,7,w) € Ifa] such that T | {ch} # €} 
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The already subtle static semantics of hybrid systems [30] becomes even 
more subtle with communication and parallelism. For example, CHPs (silently) 
synchronize with the global time u, which is free and bound in ODEs, and the 
differential u’ is bound, i.e. u € FV({a’ = 0 & x}) and u, w E€ BW({a’ = 0 & x}) 
if the evolution has a run of non-zero duration, regardless of whether u occurs 
in x. Since reachable worlds of CHPs consist of communication and state, bound 
variables BV (a) of program a compare v with the state-trace concatenation w -T 
instead of missing T. Consequently, h € BV(ch(h)!0) C FV(ch(h)!6), which also 
reflects that the initial communication never gets lost. 


Lemma 10 (Bound effect property). The sets BV(a) and CN(a) are the 
smallest sets with the bound effect property for program a. That is, v = w on Vr 
and v= w-t on N (a)l ifw FL, andr | N (a)l = € for all (v,7,w) € Ifa]. 


By the following communication-aware coincidence property, terms and for- 
mulas only depend on their free variables, which for trace variables can be further 
refined to the subtraces whose channels are accessed. This subtrace-level preci- 
sion is crucial in the soundness proof of the parallel injection axiom as it allows 
to drop @ from [a || 6] only if 8 does not write channels of y that are not also 
written by a. The signature X(-) of an expression denotes all occurring symbols. 


Lemma 11 (Coincidence for terms and formulas). The sets FV(e) and 
N (e) are the smallest sets with the communication-aware coincidence property 
for term or formula e. That is, if v | N(e) = 6 | CN(e) on W (e) and I = J 
on i(e), then Iv[e] = Jõe]. In particular, for formula ġ: Iv E ¢ iff JÙ F @. 


Programs communicate but do not depend on the recorded history, thus 
the coincidence property for programs is not communication-aware. However, 
programs can produce the same communication starting from coinciding states. 


Lemma 12 (Coincidence for programs). The set WV (a) is the smallest set 
with the coincidence property for program a. That is, ifv = ù on X D F(a), 
and I= J on Xa), and (v, T, w) € Ia], then (0,7, ©) € J[a] exists such that 
w= on X, andt =F, and (w = L iff ù= L). 


3 Uniform Substitution for dLoyp 


In dLcpp, a uniform substitution [30] o maps function and predicate symbols to 
terms (of equal sort) and formulas, respectively, while substituting the arguments 
of the symbol for their placeholders in the replacement, and program constants 
are mapped to CHPs. For example, o = {f(-) = -+ 1,a m ch(h)?v; {a’ = v}} 
replaces all occurrences of function symbol f with -+ 1 while the reserved 0-ary 
function symbol - marks the positions for the parameter of f in the replacement. 
Moreover, o replaces the program constant a with the program ch(h)?v; {2’ = v}. 

The key to sound uniform substitution is that new free variables must not 
be introduced into a context where they are bound [8]. In the presence of com- 
munication, likewise, new channel access must not be introduced into contexts 
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where the channel is written (B I). For parallelism, substitution must not reveal 
internals of the parallel context to the local abstraction of a subprogram (B II), 
and must not violate state disjointness. The one-pass approach [32] used for 
dLcyp postpones these checks and simply applies the substitution recursively 
while collecting written variables and channels as taboo set (Fig. 2), thus oper- 
ates linearly in the input. Clashes between the taboo, and new free variables and 
channel access are only checked locally at the replacement site. Likewise, clashes 
between the permitted channels and variables of a program constant, and its 
replacement program are checked locally. 

The substitution operator oy (a) for program a takes an input taboo U C 
V U and a parallel context W C V, and returns, if defined, the substitution 
result and a set of output taboos Z C V U R. For terms and formulas, the 
substitution operator o only takes a taboo U C VUR as input. The substitution 
process clashes, i.e., prevents unsound instantiation, if it were to introduce a free 
variable or accessed channel into a context where it is bound (B I) or if it were to 
write variables and channels violating abstraction (B II). Moreover, substitution 
preserves well-formedness of programs and formulas, i.e., substitution clashes if 
replacements were to violate well-formedness. 


o” (z) =z for z eV 
o” (f(¥,e)) = {= o" (el Y)} APLC iE (lof (U Neil) Ne =g 
o (op(er,..-,ex)) = op(o" (e1); 0 oY (ek)) for built-in op € {+-+ J Y,...} 
a” ((0)) = (a (0) 
o7 (e1 ~ e2) = o (e1) ~ o” (e2) 
o” (p(Y,e)) = {= o” (e LY) }"(op(-)) if (PV(op(-)) U N(op(-))) VU = 0 
a” (~p) = 70" (4) 
o (pry) =o" (p) Ao") 
o” (Yz) =Vz og) 
o” (lav) = [o¥%(a)o%(u) 
a” ([alra,cyb = lee” (@)] (62 (A),02(c)}7 (4) 
a ieee lY, z)) = ca if BV (oa) C Z and N (oa) = Y 
Zaa = 0) = x := oW (0) 
UW 


oy (2x) = 2 (x) 
mY (fa! =0&x}) = {r = on (0) &oVW (y)} with Z =U U {a, 2’, u, u} 
UU fen,nj (Ch(h)!8) = ch(h)!o""™ (0) 
Ou icn n a} (Ch(h)?x) = ce 
ee (aU 8) = 07" (a) U oz, (8) 
07," (a;b) = o7" (a);07) irca) 
oy” (a*) = (0% (a))* when oY" (a) is defined 


U,W U,W U,Wy, 
oz uz (@ || 8) =0z, 7" (a) ll oz, 7*8) 


Fig. 2. Application of uniform substitution for taboo U and parallel context W, where 
Wu,y = W U (V (oW (y) \ (fu, u'} U Vr)) for any program y, and e | Y for term e is 
recursive push down of projection |Y, where p(Yo,e) | Y = p(Yon Y,e). 


Uniform Substitution for Communicating Hybrid Programs 105 


The side condition (FV (ø f (-)) U N (o f (-))) OU = @ implements locally that 
the replacement for f must not introduce free parameters that are tabooed by U 
(B I). The substitution {- + oY (e | Y)}® is responsible for the argument e,’ 
where Ø suffices as the taboo U is already checked on e | Y. By the projection, 
e|Y only depends on channels Y. Quantification Vz taboos the bound variable z. 
Program a in a box or ac-box has an empty parallel context Ø. 

The substitution oY W (a) computes the output taboo Z by adding the writ- 
ten variables and channels of program a to U, e.g., real variable x for assignment 
x := 0 and for receiving ch(h)?x additionally channel ch and trace variable h. 
The output taboo Z is passed to ac-formulas and postconditions of boxes and 
ac-boxes for recursive checks for clashes w.r.t. (B I). Crucially for soundness, 
Lemma 13 below proves that on) correctly computes the output taboo Z. 

The taboo UUW passed to nested expressions contains the parallel context W 
to prevent free variables in replacements of function and predicate symbols that 
are bound in parallel. This prepares the substitution process to preserve the 
syntax restrictions for parallel composition from previous work [6]. Substitu- 
tion for evolution {a’ = 6 & x} considers that the global time p, py’ is always 
implicitly bound regardless of whether it occurs in x, x’. The fixpoint notation 
of (a) for the replacement of repetition a* ensures that the output taboo of 
the first iteration is tabooed in the subsequent iterations [32]. Computing the 
parallel context of a and in case a || 8 requires one additional pass for both 
subprograms because what they potentially bind after substitution adds to the 
parallel context of the respective other subprogram. 


Lemma 13 (Correct output taboo). Application oy (a) of uniform sub- 
stitution retains input taboo U and correctly adds the bound variables and written 
channels of program a, i.e., Z2UU BV(o¥’ (a)) U N(of (a). 


The side condition of oy (al Y, Z)) maintains local abstraction of subpro- 
grams (B II) because the replacement cannot bind more than a(Y, Z), thus can- 
not bind variables and channels of an abstraction that is independent of a(Y, z). 
This also preserves state-disjointness (well-formedness) of parallel programs. 


3.1 Semantic Effect of Uniform Substitution 


The key ingredients for proving soundness of uniform substitution are Lemma 16 
and 17 below. They prove that the effect of the syntactic transformation applied 
by uniform substitution can be equally mimicked by semantically modifying the 
interpretation of function and predicate symbols, and program constants. This 
adjoint interpretation ož, I for interpretation J and state w changes how symbols 
are interpreted according to their syntactic replacements in the substitution ø. 


3 Extension to vectorial arguments is straightforward. 
t For a || 8, the restriction is (V(a) N BV(8)) U (v(8) N BV(a)) © {p, u'} U Vr [6]. 
However, in this paper, programs obey a less restrictive syntax for simplicity. 
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Definition 14 (Adjoint substitution). For interpretation I and state w, the 
adjoint interpretation o7,I changes the meaning of function and predicate sym- 
bols, and program constants according to the substitution o evaluated in state w: 
ox T(fM : Marg) : Marg > M; d œ I¢wlof(-)] where M, Marg € {R,N, 2,T} 
o% I (p : Marg) = {d € Marg | [4w E op(-)} where Marg € {R, N, 2, T} 
o$ 1(alY, 2)) = Toa] 


We follow the observation for dGL [32] that the more liberal one-pass sub- 
stitution requires stronger coincidence between the substitution and the adjoint 
on neighborhoods of the original state. Where the dGL soundness proof has suc- 
ceeded by a neighborhood semantics of state on taboos, the dLcyp proof succeeds 
with a generalization to a neighborhood semantics of state and communication 
on taboos. The neighborhood of a state consists of its variations: 


Definition 15 (Variation). For a set U CV U Q, a state v is a U-variation 
of state w if v and w only differ on variables or projections onto channels in U, 
ie., v | (UPNQ) = w| (UEN R) onUE AV. 


The proofs of Lemma 16 and 17 follow a lexicographic induction on the 
structure of substitution, and term, formula, or program. In Lemma 17, the 
induction is mutual for formulas and programs. 


Lemma 16 (Semantic uniform substitution). The term e evaluates equally 
over U -variations under uniform substitution oY and adjoint interpretation ož I, 
i.e., Ivot (e)] = o%,Iv[e] for all U-variations v of w. 


Lemma 17 (Semantic uniform substitution). The formula ¢ and the pro- 
gram œ have equal truth value and semantics, respectively, over U -variations 
under uniform substitution o” and adjoint interpretation ož I, i.e., 


1. for all U-variations v of w: Iv E oY (¢) iff ož Iv 
2. for all (U UW )-variations v of w: (v,T,0) € Toy (a)] iff (v, T, 0) € ož Ifa] 


3.2 Uniform Substitution Proof Rule 


The proof rule US for uniform substitution is the single point of truth for the 
sound instantiation of axioms (plus renaming of bound variables [30] and written 
channels, e.g., |x := Oly(x) to [y := 0ly(y) and [ch(h)?x]y(ch) to [dh(h)?x]y (dh). 
Soundness of the rule, i.e., that validity of its premise implies validity of the 
conclusion, immediately follows from Lemma 17. Since the substitution process 
starts with no taboos, o(¢) is short for o9(¢). If the substitution clashes, i.e., 
a’ (¢) is not defined, then rule US is not applicable. 


Theorem 18 (US is sound). The proof rule US is sound. 


o(¢) 
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Unlike dL [30] and dGL [32], dLcup has a context-sensitive syntax for programs 
and formulas (see Definition 2 and Definition 4). By Proposition 19, uniform 
substitution, however, preserves syntactic well-formedness. Since all axioms in 
Sect. 4 will be well-formed, only well-formed formulas can be derived in dLoyp. 


Proposition 19 (US preserves well-formedness). The result oY ($) (if 
defined) of applying uniform substitution to a well-formed formula ¢ is well- 
formed. 


4 Axiomatic Proof Calculus 


Figure 3 presents a sound proof calculus for dLcyp. The significant difference to 
dLoyp’s schematic calculus [6] is that it completely abandons soundness-critical 
side conditions, internalizing them syntactically in the axioms. Only axiom []wa 
was adjusted to obtain a symbolic representation and an ac-version Kac of modal 
modus ponens is included. Now, distribution of ac-boxes over conjuncts []ac^ 
and ac-monotonicity M[-]ac derive from Kac, thus are dropped. Except for the 
small changes soundness is inherited from the schematic axioms [6]. 

Algebraic laws for reasoning about traces [6] can be easily adapted to uniform 
substitution as well [7]. Decidable first-order real arithmetic [41] and Presburger 
arithmetic [34] have corresponding oracle proof rules [6]. 


Remark 20. To obtain a truly finite list of axioms from Fig. 3, symbolic (co)finite 
sets can be finitely axiomatized as a boolean algebra together with extensionality, 
which can be unrolled to a finite disjunction for (co)finite sets [7]. 


Parallel Composition. The parallel injection axiom [||_]ac in Fig.3 decom- 
poses parallel CHPs by local abstraction (B II). Unlike dLcyp’s [6] and Hoare- 
style [46,47] schematic calculi for ac-reasoning, axiom [|| _]ac internalizes the 
noninterference property [6, Def. 7] that determines valid instances of formula 


lala? > [a || Bray (1) 
purely syntactically. To focus on noninterference, a(Ya, Za) ||wt b( Ys, Zo) abbrevi- 
ates well-formed parallel composition a(Ya, Za) || b(Ys, (Ze N ZE) U {u, uw} U Vr) 
using operator ||wf for program constants a(Ya, Za), b(Ys,Z). This notation 
ensures disjoint parallel state except for the global time p, u’ and recorder vari- 
ables Vr. 

Intuitively, axiom [||_]ac restricts 6 in Eq. (1) such that œa overapproximates 
the behavior of a || 8 influencing A, C, or w. For this purpose, noninterference 
internalized in b(Y, (Y? U Ya), Z°) forbids b to bind variables Z that are free 
in the postcondition p(Y, zZ), and Y® forbids b to bind channels Y (except for 
channels Y, written by a because joint parallel communication can already be 
observed from a, too). Moreover, parallel programs always agree on the global 
time u, u’ and the communication recorded by trace variables Vz. Therefore, the 
operator ||wf explicitly allows their sharing even if Z° disallows it. Note that Yq 
and Y, and Z, and Z may overlap but can also be disjoint. 
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z := 9" \p(x) © p(g®)* [Jac [a; blir, aP © [algr.3[blen.aq}P 


=| 


[ [ 
a [x := +|p(x) e Va p(x) [U]ac [aU b]ir,Q3P © [a]yr,q}P A fbl, aP 
[?] [?ar]p + (az > p)* ("Jac [a*l aP + [a"]er,Q3P A laltra lal, aP’ 
lt, [a]P © [a]¢r,73P [wa [a]¢t,wa3T A [a] cri aR Q nP > laler, Qng.. 


[Jac [a(Ya; Za)]¢n,a3P(Y, Z) > [a(Ya, Za) lwt ¢¥S N (Y° U Ya), 2) cn,a3(¥, 2)" 
[u] HE = 9° (E, u) & (z, u) Fpl, u) © Hu = 1,2" = 9° (E, u) & qr(E, u) Yp, p) 
[cht]  [ch(h)!g®]p(ch, h) + Vho (ho = h - (ch, g®, u) —> p(ch, ho)) 


mp2? P 
- re q 
chiJac [hh ena GA (Fr [ch(h)!9®] (GA E> B))) 
[ch? ]ac ch(h )?a] ¢,q}p(ch, h, x) > [x = *] [ch( h)!a'] ¢2,43.p(ch, h, x) Cn 
Q){R,Q} 
[elac  [a(0, Ved) cx,qyP + QA (R > [a(0, Ve)]P) pla) 
Wil lali, oP e QA [altra (QA (R > P)) Va p(x) 
Iac a“) ¢r,qyP © [altr aP A [a*l (P > laltra P) cp fie Pa 
C(P1)6C(P2) 


Kec [algr,q,+a.}(P1 > P2) > (lali, aP: > [a] gr,q.}P2) 


P; = p;(Y, Z), and Rj = r;(Y,h), and Q; = q;(Y,h), and Ẹ = x(ch, h), where j may 
be blank, and Y C 92, Z C Vr U V7, and h C V7 are (co)finite. 


* Replacements for function symbol g? and predicate symbol gg are restricted to poly- 
nomials in Vg and first-order real arithmetic, respectively. 

? Recall that [a°] {r,a} P @ QA (R > P) by [e]ac and [?] since a? = ?T. 

° Wa is the compositionality condition (R ^A Q,; > R2) A (RA Q; > R1). 

d The operator ||w¢ abbreviates well-formed parallel composition (see above). 


Fig. 3. dLcup proof calculus 


Despite its asymmetric shape, axiom [||_]ac decomposes [a||/3](¢Aw) into [a]ġ 
and [f]q (if they mutually do not interfere) via independent proofs for [a||G]¢ 
and [a||S]q, which drop either a or 8 by [||_]ac modulo commutativity. 


Axiom System. For each program statement, there is either a dynamic or an 
ac-axiom because the respective other version derives by axiom []+,7 or [e]ac. 
Axioms [:=], [:*], and [?] are as in dL [30]. Axioms [3]ac, [U]ac, and [*]ac for 
decomposition, and Ipc for induction carefully generalize their versions in differ- 
ential [30] dynamic [14] logic to ac-reasoning. Sending is handled step-wise via 
flattening the assumption-commitments by axiom [ch!]ac and axiom [ch!] that 
executes the effect onto the recorder h. The duality [ch?]ac turns receiving into 
arbitrary sending, which only synchronizes if it agrees with the parallel context 
on the value. Usage of axiom W[]ac is for convenience. Axiom [u] materializes 
the flow of global time p such that dL’s axiomatization of continuous evolution 
[30] gets applicable, which requires ODE shape z’ = f®(z). The axiomatic proof 
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rules Gac, MP, Y, and CE are an ac-version of Gödels generalization rule, modus 
ponens, quantifier elimination, and contextual equivalence, respectively. 

The axiom []wa can weaken assumptions. Its slight change compared to 
dLcup’s schematic calculus [6] exploits that the compositionality condition Wa 
is only required for a’s reachable worlds. Interestingly, dLcyp’s monotonicity 
rule M[-]ac [6] does not derive from modal modus ponens Kac and Gödel gen- 
eralization Gac in analogy to dL [30] but needs W[]ac handling monotonicity of 
assumptions, which does not fit into Gac because necessitating the assumption 
in Gac would render the derivation of [a], r} T by Gac impossible. 

Axioms using postcondition P = p(Y, Z), e.g., in [;]ac, allow any replacement 
of P since accessed channels Y C N and free variables zZ C Vp U Vz can be 
arbitrary. Replacements of assumptions R = r(Y,h) and commitments Q = 
q(Y,h) can instead only mention trace variables h C Vy bound in their context. 
This reflects that trace variables are the only interface between the program a 
and the ac-formulas A and C in an ac-box [a] ¢,c} (well-formedness). 


Theorem 21 (Soundness). The proof calculus for dLcyp presented in Fig. 3 
is sound as an instantiation of the schematic calculus [6]. 


Clashes. Clashes sort out unsound instantiations of axioms. Unlike in dL and 
dGL [30,32] whose clashes are solely due to tabooed variables in terms and for- 
mulas, clashes in dLcyp can also be due to tabooed channels, and even due 
to taboos in programs. For example, the substitution o = {a > gh(h)!1,br 
ch(h)!2, prow, reT,qoT} with Y = |h | ch| > 0A |A | dh| > OA y < 0 clashes 
below, where Y = {ch, dh}, and Z = h, y, and R = r(Y), and Q = q(Y). Writing 
channel ch in the replacement for b would break the local abstraction of a as ch 
is accessed in w but not written in the replacement for a, thus the clash indeed 
sorts out an unsound instantiation. 


[a({gh}, AD] m,a}P(¥, 2)  [al{eh}, h) Iwe bch} N(Y Ugh}, irae, 2) 
[gh(h)er7y% > [gh(h)! || ch(A)!2] r,r} 
In contrast, o = {a> ch(h)? x; gh(h)!1, b> ch(h)!2, pe Y, r=T, qT} does 
not clash below, where Y = {ch, dh}, and Y, = {ch, gh}, and other abbreviations 
are as above, because ch € Y? U Y, = {dh}°. Intuitively, the ch-communication 


of b remains observable after dropping b from the parallel composition as it is 
joint with a. 


4clash 


[a(Ya, h, z))(R,Q}P(Y, Z) á [a(Ya, h, x) Il we b({ch} N (Y° U Ya), ZV er.qyPlY, Z) 
[ch(h)?ax; gh(h)'] 7,73 > [(ch(h)?a; gh(h)!1) || ch(h)!2] cr 74H 


[ll-]ac 


Also note that by the operator ||wf for well-formed parallel composition, the 
recorder variable h can be shared without causing a clash above. However, clashes 
prevent instantiation that would violate syntactic well-formedness of programs 
(Definition 2) by binding the same state variable in parallel: 
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[a(0, xiral y) > [a(0, a) [hve 610, {x;y} inal 9) 
[z := ylit, myy = 2 > [æ := y || x := Olit, r} y = £ 


4 clash 


Well-formedness of programs and formulas is ensured in the axioms by well- 
formed parallel composition ||w¢ and limitation to trace variables h in R; = 
r;(Y,h) and Q; = a, h) in ac-boxes [a] {RQV in Fig. 3, respectively. By 
Proposition 19, uniform substitution always preserves well-formedness. 


Example 22. The proof tree below decomposes safety (Example 5) of cruise con- 
trol (Example 3) into safety @ of controller ct and branch @ to be continued 
to safety of the vehicle ve. The lower subproof introduces the ac-formulas 


A=C& (|h | tar| > 0 +0 < val(h | tar) < V) 


using axiom []wa to abstract from the communication between ct and ve. The 
upper subproof uses the parallel injection axiom [||_]ac to drop ve. Uniform 
substitution US does not clash as the commitment C only refers to joint com- 
munication of ct and ve. Other applications of US (e.g., for []wa) are omit- 
ted. Rule Prop denotes propositional reasoning. Abbreviations are as follows: 
a = aftar, vt, t,t’, u, u, h), R= r(tar,h), Q = q(tar, h), P = p(tar). 


* 


[ll-]ac 
@ [a]r,q}P > [a ||we b(tar, vse, ave, to, Vve, Use) ]tR,Q}P wr 
=s |e" T etm T — [ct*||ve* IF g 
g > [ct*]iT,c} i [L lT, [et* ||ve*]{T,c} ee 
p — [ct*||ve"] (7,c}T 
TT M[-]ac 
p > [ct*||ve*]traa,c} T © 
P aa S p= [ct™ || ve"] (ta a,t} sate 
rop ———___—_ ae ee 
(C+ A) AT yp > [ct*||ve"] ¢taa,c}T A [ct™||ve*] raa, T} Vsate 
Gac * * * * [ac^ 
yp — [ct*||ve*]{r,c—=a} T yp > [ct*||ve*] Eaa T A sate) 


yp — [ct*||ve*] ¢t,ca}T A let lve“ liran cari (T A sate) 
yp => let“ lve“ lir cat} (T A sate) 
p > [ct"||ve"]Ysafe 


WA 


(a7, M[-Jac 


5 Related Work 


Uniform substitution for differential dynamic logic dL [30] generalizes Church’s 
uniform substitution for first-order logic [8, §35, 40]. Unlike the lifting from dL 
to differential game logic dGL [31], dLcyp generalizes into the complementary 
direction of communication and parallelism. Unlike schematic calculi [2,19, 27, 
44,46], whose treacherous schematic simplicity relies on encoding all subtlety of 
parallel systems in significant soundness-critical side conditions, our development 
builds upon a minimalistic non-schematic parallel injection axiom and sound 
instantiation encapsulated in uniform substitution. This provides a new, more 
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atomic and more modular understanding of parallel systems overcoming the 
root cause for large soundness-critical prover kernels [5,9,12,16,18,36]. Usage of 
uniform substitution reduced the kernel of the theorem prover KeYmaera from 
105 kLOC to 2 kLOC in KeYmaera X [23]. We expect dLcyp’s integration into 
KeYmaera X to stay in the same order of magnitude. 

To the best of our knowledge, assumption-commitment reasoning [22,46]° has 
no tool support, which might be due to vast implementation effort. The latter can 
be underpinned by analogy with tools [5,9,16,18,36] for verification of shared- 
variables concurrency, some of which use rely-guarantee reasoning [36,39]. Unlike 
uniform substitution for dLcyp that enables a straightforward implementation of 
a small prover kernel, they all rely on large soundness-critical code bases. Unlike 
refinement checking for CSP [12] and discrete-time CSP [4], dLcup supports 
safety properties of dense-time hybrid systems. Contrary to our goal of small 
prover kernels, implementations of model checkers [12] are inherently large. 

Beyond embeddings of concurrency reasoning for discrete systems into proof 
assistants [3,25, 26,38], dLcyp can verify parallel hybrid systems synchronizing in 
shared global time. The latter imposes even more complicated binding structures 
than parallel or hybrid systems alone but dLcyp’s uniform substitution calculus 
continues to manage them in a modular way. 

The recent tool HHLPy [37] for hybrid CSP (HCSP) [17] is limited to 
the sequential fragment. Unlike extending HHLPy to parallelism, which would 
require extensive soundness-critical side conditions and a treatment of the dura- 
tion calculus, integrating dLcyp into KeYmaera X [11] boils down to adding a 
finite list of concrete object level formulas as axioms and only small changes 
to the uniform substitution process. In contrast to dLoyp’s compositional par- 
allel systems calculus [6], HCSP calculi [13,20,42] are non-compositional [6] as 
they either unroll exponentially many interleavings from the operational seman- 
tics [13,42] or can only decompose independent parallel components [20] causing 
limited ability to reason about complex systems. Former HCSP tools [43,45] only 
implement a non-compositional calculus [20] reinforcing the significance of our 
approach for managing parallel hybrid systems reasoning. Other hybrid process 
algebras defer to model checkers for reasoning [10,21,40]. Further discussion of 
dLcoup is in [6]. 


6 Conclusion 


This paper introduced a sound one-pass uniform substitution calculus for the 
dynamic logic of communicating hybrid programs dLcyp thereby mastering the 
significant challenge of developing simple sound proof calculi for parallel hybrid 
systems with communication. Uniform substitution can separate even notori- 
ously complicated binding structures from parallelism with communication in 
multi-dynamical logics into axioms and their instantiation. In the case of dLcyp, 


5 Assumption-commitment and rely-guarantee reasoning are specific patterns for 
message-passing and shared variables concurrency, respectively. The broader assume- 
guarantee principle has been used across diverse areas for various purposes. 
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this applies to channel access in predicates and the need for local abstraction of 
subprograms in parallel statements, and it even turns out that uniform substitu- 
tion can maintain a context-sensitive syntax along the way. Thanks to uniform 
substitution, parallel systems reasoning reduces to multiple uses of an asymmet- 
ric parallel injection axiom. 

Now, with uniform substitution a straightforward implementation of dLoyp 
in KeYmaera X is only one step away. 
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Abstract. We present an Isabelle/HOL formalization of Simple Clause 
Learning for first-order logic without equality: SCL(FOL). The main 
results are formal proofs of soundness, non-redundancy of learned 
clauses, termination, and refutational completeness. Compared to the 
unformalized version, the formalized calculus is simpler and more gen- 
eral, some results such as non-redundancy are stronger and some results 
such as non-subsumption are new. We found one bug in a previously 
published version of the SCL Backtrack rule. Compared to related for- 
malizations, we introduce a new technique for showing termination based 
on non-redundant clause learning. 


Keywords: interactive theorem proving - automated theorem 
proving - first-order logic + CDCL - SCL - non-redundant clause 
learning 


1 Introduction 


The SCL (“Clause Learning from Simple Models” or simply “Simple Clause 
Learning”) family of calculi lifts a conflict-driven clause learning (CDCL) app- 
roach to first-order logic: SCL(FOL) is for first-order logic without equal- 
ity [8,10], SCL(T) is for first-order logic with theories [6], SCL(EQ) is for first- 
order logic with equality [12], and HSCL is for exhaustive partial models explo- 
ration in first-order logic without equality [7]. In its original formulation [10], 
SCL(FOL) required exhaustive propagation and a precise strategy for the appli- 
cation of the rules in order to learn non-redundant clauses. This was improved 
upon by SCL(T) [6] by dropping exhaustive propagation and weakening the 
strategy, i.e., any run according to the strategy in [10] is also a run according 
to the strategy in [6]. The SCL(FOL) version presented in Bromberger et al. [8] 
integrates those changes and additionally refines the Backtrack rule. 

We present an Isabelle/HOL formalization of the non-executable specifica- 
tion of SCL(FOL) based on and developed in parallel to Bromberger et al. The 
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main results are soundness, non-redundancy of learned clauses, termination, and 
refutational completeness. In contrast to the goal of Bromberger et al. to guide 
toward an implementation, our goal is to be as simple and general as possi- 
ble. For that, we (i) simplified the calculus (e.g., no more explicity tracking of 
decision levels), (ii) generalized the calculus (e.g., multiple acceptable positions 
in the Backtrack rule), (iii) strengthened existing theorems (e.g., Theorem 11 
on non-redundancy), and (iv) proved new theorems (e.g., Corollary 12 on non- 
subsumption). 

This work is part of the IsaFoL (Isabelle Formalization of Logic) effort [2], 
which aims at developing a library of results about logical calculi. The Isabelle 
theory files are available in the Archive of Formal Proofs (AFP) [9] and amount 
to more than 11000 lines of source text. They build heavily upon many other 
entries of the AFP: (i) First _Order_ Terms [17] for first-order terms, term sub- 
stitutions, and MGU; (ii) Ordered_Resolution_Prover [14-16] for the clausal 
calculus, clause substitutions, Herbrand interpretation, and compactness of first- 
order logic; and (iii) Saturation _Framework_ Extensions [5,18] for entailment 
of the clausal calculus. We contributed many lemmas and definitions back to 
both the Isabelle distribution and the aforementioned AFP entries (e.g., over 
50 to First_ Order_ Terms). We made heavy use of the Isar language [19] to 
write structured proofs, the Sledgehammer tool [13] for proof automation, and 
locales [1|—Isabelle’s parameterized module system—to structure our develop- 
ment and reuse existing components from the AFP entries. To ease associating 
the main results in this paper with their counterparts in the Isabelle develop- 
ment, names in monospace are taken verbatim from the formalization. 

The formalization follows the basic ideas of the existing formalizations of the 
first-order resolution calculus [16] and propositional CDCL calculi [3,4]. Com- 
pared to propositional logic, first-order logic adds a number of challenges: the 
extra term level requires to consider variables, substitutions, groundings, and 
the concept of factorization. To preserve completeness, propagation of ground 
literals must not be exhaustive anymore, resulting in a level-wise exploration 
w.r.t. a bounding atom. Inside this bound, the calculus always terminates. If 
one level does not suffice to find a refutation, the bound can be increased and 
exploration can be continued. For unsatisfiable formulas, we prove the existence 
of a bound sufficient to derive L, which guarantees that only finitely many levels 
need to be explored. 

The paper is now organized as follows. Section 2 recaps the SCL(FOL) calcu- 
lus from Bromberger et al. as the basis of our formalization presented in Sect. 3. 
We first present the Isabelle formalization of the abstract rules of the SCL(FOL) 
calculus. Then we prove invariants preserved by the rules starting from the initial 
state, Lemma 1. Subsequently, we prove soundness, Theorem 7, non-redundancy 
of learned clauses, Theorem 11, termination with respect to a fixed bound, The- 
orem 18, and finally refutational completeness with respect to an appropriate 
bound, Theorem 20. We discuss important aspects of the formalization and proof 
ideas here and refer the reader to the formalization for more details. The paper 
ends with a short conclusion of the obtained results. 
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2 The SCL(FOL) Calculus 


We shortly repeat basic first-order logic notions and the SCL(FOL) calculus pre- 
sented in Bromberger et al. We consider an untyped, first-order logic without 
equality. A term is defined inductively as either a variable x or a function appli- 
cation f (T) for a constant f and a (possibly-empty) list of terms F. An atom is 
a predicate symbol applied to a list of term arguments. A literal is either a posi- 
tive atom A or a negative atom ~A. For literals we write L or K. The atom of a 
literal may be selected with atom(A) = A and atom(~A) = A. The complement 
of a literal is defined as comp(A) = ~A and comp(~A) = A. A disjunctive clause 
is a finite multiset of literals. For clauses we write C or D. We use the syntax 
Lv C and C V D synonymously with the multiset sums {L} + C and C + D 
respectively. We also use the syntax L synonymously with the empty multiset 
{}. All variables in clauses are to be understood as universally quantified. 

Substitutions are total unary functions from variables to terms. A substitu- 
tion ø may be applied to a variable x, a term t, an atom A, a literal L, or a clause 
C, denoted xa, to, Ao, Lo, or Co respectively. Substitution application is left- 
associative, i.e., Co102 = (Co1)o2. The domain of a substitution ø is defined as 
dom(o) = {x | xo 4 x}. The composition of two substitutions g1 and o2 is defined 
as the function 01 002 = (Ax. £0102). A substitution y is a grounding for a term t, 
an atom A, a literal L, or a clause C if ty, Ay, Ly, or Cy are respectively ground, 
i.e., if they do not contain variables. A substitution p is a renaming if it is injec- 
tive and zp is a variable for all variables x. The inverse of a renaming p is any 
function p7} from terms to variables such that p~! (xp) = = for all variables x. 
The restriction of a substitution o to a set of variables V is defined as the function 
(Az. if x € V then xo else x). A substitution ø is idempotent if o o o = ø. A sub- 
stitution v is a unifier for a set of terms T if tu = təv for all terms tı € T and 
t2 E€ T. A substitution yis a most general unifier (MGU) for a set of terms T if u is 
a unifier for T and there exists a substitution ø such that u o ø = v for all unifiers 
v for T. A substitution p is an idempotent, most general unifier (IMGU) for a set 
of terms T if u is a unifier for T and p o v = v for all unifiers v for T; note that u 
is an IMGU iff it is both idempotent and a MGU. 

When formalizing logical calculi, IMGUs are preferable because they allow 
to apply groundings to a term both directly and after applying an IMGU, i.e., 
ty = tuy for all terms t, groundings y, and IMGU wp. Non-idempotent MGU 
do not have this property as the following counter-example shows. Consider the 
terms tı = f(x,y,z) and tg = f(w,y, z), the grounding y = {x => a, yo b, z œ 
c, w+ a}, and the non-idempotent MGU p = {x w, yr z, z |> y} where 
x, Y, z, w are variables and a, b, c are ground constants, then we have tiy = 
f(a,b,c) 4 f(a,c,b) = tip. In published literature, an IMGU is often meant 
instead of an MGU; the idempotency requirement is often kept implicit because 
standard implementations for computing MGUs actually produce IMGUs. 

The function gnd(C’) = {Cy | Cy is ground} expresses the set of all ground- 
ings of a clause C. The function gnd(V) = (UC € N. gnd(C)) expresses 
the set of all groundings of a set of clauses N; its subset whose clauses are 
restricted to atoms less than or equal to a bound 8 w.r.t. an order <p is 
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defined as gnd=#9(N) = {C € gnd(N)|VL € C. atom(L) <g 8}. Note that 
gnd(gnd(N)) = gnd(N). The strict order <p is total on ground literals and 
is such that for each 8 there are only finitely many literals L with L <p $. 
An example of such an order could be KBO without zero-weight symbols. Note 
that LPO does not satisfy the last condition of a <g order although it is a 
well-founded and total order. 

Herbrand entailment is defined as (I Hn N > (VC e€ N. I H» C)) 
for a set of clauses N, (I Ey C — (AL e C. I Hp L)) for a clause C, 
(Ey A A € TI), and (I Ex 7A <> A ¢ I) for a literal with atom 
A; note that the symbol H» is overloaded. Ground entailment is defined as 
(Ni Eg Nə — (VI. I En Ni — I Ex No2)). First-order entailment is defined 
as (Ni H| N2 <> gnd(Ni) Eg gnd(N2)). A set of ground clauses N is satisfiable 
if there exists a Herbrand interpretation J such that I =} N; otherwise, it is 
unsatisfiable. 

An annotated literal is the pairing of a literal with an annotation. We call 
it a decision literal when the annotation is a natural number n indicating the 
literal’s level (i.e., that it is the nth decision) and a propagation literal when the 
annotation is a closure of the clause the literal originated from. The literal of an 
annotated literal K is denoted lit(X) and the annotation is denotated ann(X). 
The level of a clause is the maximum level of its literals. A trail is a finite 
sequence of annotated ground literals: it grows from left to right. The empty 
trail is written € and appending a new annotated literal K to a trail I’ is written 
T,K. The concatenation of two trails J} and I is written [>,I,. A trail l can 
be converted to a set with set(I’). 

A literal L is true under trail I if L € {lit(K)|K € set(I)}. A literal L is 
false under trail I’ if comp(L) € {lit()|K € set(I)}. A literal L is defined in 
a trail I’ if L is true or false under I’; otherwise, it is undefined. A clause C 
is true under trail I’ if (AL € C. L is true under I’). A clause C is false under 
trail I if (YL € C. Lis false under T`). A clause C is defined in a trail I if 
(VL € C. L is defined in I’); otherwise, it is undefined. 

The SCL(FOL) calculus is defined as a transition system operating on states 
(T; N; U; @;k;C) where T is a trail, N is a finite set of initial clauses, U is a finite 
set of learned clauses, 7 is a bounding atom restricting the considered ground 
literals, k is a natural number counting the number of decisions taken in I’, and 
C is either T or a clause closure (C;y) such that Cy is ground and false in I’. 
The initial state is (€; N; Ø; 6;0; T) for some initial clause set N and bound £. 

The transition relation =sc, is a mapping between states. The rules below 
are from Bromberger et al. and serve as a reference for the Isabelle formalization 
described in Sect. 3. 


Propagate (I;N;U;ß;k;T) >scp (T, Ly{CovV))47); N: U; B; k; T) 

if (Cv L) € (NUU), C = CoV C1, Ciy = LyV---V Ly, Coy does not contain Ly, 
u is the IMGU of the literals in C1 and L, (C V L)y is ground, (CV L)y <B {8}, 
Coy false under I’, and Ly is undefined in I’. 

Decide (T; N;U; 6; k; T) =>scu (T, Ly***; N; U; 85k +1;T) 

if L € C for a C € (NUU), Ly is a ground literal undefined in I’, and Ly <p B. 
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Conflict (1;N;U;6;k;T) sox (T; N; U; 8; k; (C;7)) 
if C € (N U U), Cy is false under I for a grounding substitution y. 

These rules construct a (partial) model via the trail [ for NUU until a 
conflict, i.e., a clause false under I’ is found. The above rules always terminate, 
because there are only finitely many ground literals L with L <p £. It might be 
necessary to successively increase (7 for full refutational completeness. 


Skip (T, K; N; U; p; k; (C;7)) ser M; N;U; 2; k — i; (C57) 
if comp( K) does not occur in Cy, if K is a decision literal then i = 1; otherwise, 
i=0. 


Factorize (I; N;U; p; k; (CV LV L';7)) scr (T; N; U; p; k; ((C V L)m;)) 
if Ly = L'y and p = IMGU(L, L’). 

Note that this rule may be used multiple times if the conflicting clause con- 
tains more than two duplicates of a given literal or if multiple distinct literals 
have duplicates. 


Resolve (T, Kyp PYD); N; U; B; k; (C V L; yo)) 
>scr (I, Kyp PY); N; U; 6; k; ((C V D)u; Yc © YD) 
if Kyp = comp(Lyc), u = IMGU(K, comp(Z)). 
The clauses D V K and C V L are assumed to have disjoint variables. 


Backtrack (Io, K, T1, comp(L7)"; N; U; 8; k; (C V L; )) 

>scu (Io; N;U U {CV L}; 6; j; T) 
if Cy is of level i’ < k, and Ip, K is the minimal trail subsequence such that 
there is a grounding substitution y’ with (C V L)+/ is false under I, K but not 
in Ib, and Tọ is of level j. 

The clause C V L added by the rule Backtrack to U is called a learned clause. 
The empty clause L can only be generated by rule Resolve or be already present 
in N, hence, as usual for CDCL-style calculi, the generation of L together with 
the clauses in N UU represent a resolution refutation. 

A sequence of SCL rule applications is called a reasonable run if the rule 
Decide does not enable an immediate application of rule Conflict. A sequence 
of SCL rule applications is called a regular run if it is a reasonable run and the 
rule Conflict has precedence over all other rules. 


3 Formalization of the SCL(FOL) Calculus 


The formalization introduces some new concepts absent from Sect.2. A mul- 
tiset C can be converted to a set, i.e., without duplicates, with set(C). The 
multiplicity of an element x in a multiset C is denoted by count(C, x). The car- 
dinality of a multiset—the sum of the multiplicities of its elements—is denoted 
by |C|. The multiset whose only element is x with multiplicity n is denoted by 
repeat(n, x); note that count(repeat(n, x), x) = n, and set(repeat(n, x)) = {a} 
if n > 0. The multiset extension of an order on literals extends the order to 
multisets containing literals; we use the Huet-Oppen specification [11], one of 
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several equivalent alternatives for this extension. The adaptation of a substi- 
tution o to a renaming p is a function whose domain is the renamed domain 
of o and whose codomain is the same as øg; it is defined as the function 
(Az. if x € {yp|y € dom(c)} then (p~! x)ø else x). A substitution y is a merged 
grounding of a grounding ya for a set of variables A and a grounding yz for 
a set of variables B if (AN B = {} — (Va € A. zya is ground) — (Va € 
B. x yp is ground) — (Va € A. xy = zya) A (Va € B. xy = xyp)); an example 
of a function that fulfills this specification is (Ax. if x € A then ry, else ryp). 
The length of a trail I is denoted by |I|. The nth right-most element of a 
trail I’ is denoted by I'[n]; we use zero-based indexing where the right-most 
element is the Oth element. The Herbrand interpretation of a trail I is defined 
as HI(T) = (UK € set(L). case lit(K) of A > {A} | =A => {}). 

The formalization also changes some existing concepts. No distinction is made 
between atoms and terms, so first-order terms are used everywhere in place of 
atoms. The level annotation of a decision literal is not required anymore and 
replaced by a f marker, it is now written K = (K; f) for some literal K. A prop- 
agation literal is written (Kyp) P2) = (Kyp; (D; K;7yp)) for some literal 
K, clause D, and grounding yp. Note that the propagated literal is explicitly 
separated from its clause in the closure annotation; this eases the formulation 
of the additional invariants 5 and 6 of Lemma 1., that the respective clause is 
always false under the respective trail. For the trail I,K, the Isabelle formal- 
ization uses the constructor List.Cons K I’ which actually grows from right to 
left. However, we keep the well-established left-to-right convention in this paper 
because it significantly eases the presentation. An state is a tuple (I; U;C) where 
I is a trail, U is a finite set of learned clauses, and C is an optional clause 
closure. The individual components can be selected with trail((’;U;C)) = T, 
learned((I; U;C)) = U, and conflict((I’; U;C)) = C. The initial state is (€; {}; T), 
i.e., empty trail, no learned clauses, and no conflicting closure. The finite set of 
initial clauses N and the bounding atom ĝ are no longer stored in the state but 
are rather parameters of the transition relation; this was done to highlight the 
fact that they are never modified by any rule. The natural number k counting the 
number of decisions, used in Sect. 2 to determine an appropriate backtracking 
point, turned out not to be necessary and was dropped entirely. We assume the 
existence of a binary relation on atoms <g such that (VG. {t|t <p 6} is finite) 
but dropped the requirement for <pg to be a strict order total on ground terms. 
We also don’t lift <p to literals and clauses, but always use it at the atom level. 
We define the relation <p as the reflexive closure of <p. 

The transition relation = he is a binary predicate between states and is 
parameterized by the finite set N of initial clauses and the bounding atom £. 
It is defined as the disjunction of the following rules. Following each rule, we 
highlight the main differences from Sect. 2 not already covered. 

Propagate (I°;U;T) Se ate (T, (Luy) =C); U; T) 

if (LVC) € (NUU), y is a grounding for LVC, (YK € (LVC). atom(Ky) <B 2), 
Co = {K € C| Ky # Ly}, C1 = {K € C| Ky = Ly}, Coy is false under I, Ly 
is undefined in I’, and u is an IMGU for all terms in {atom(K)| K € (Lv C;)}. 


122 M. Bromberger et al. 


Compared to Sect. 2, we express the splitting of C into Co and C1 formally 
as set operations and replace <g with <p. This replacement has no effect on the 
results but allowing the bound to be in gnd=#"(N) eases the proof of Lemma 
21, where the largest element of the (finite) unsatisfiable core is directly used as 
new bound. There are also situations where the maximal element of a signature 
is required to derive a contradiction: a non-strict bound requires to artificially 
extend the signature while a non-strict bound does not. 


Decide (I'U;T) =peeiae (T, (L7); U; T) 

if (LVC) € N, yis a grounding for L, Ly is undefined in I’, and atom(Ly) <p 2. 
Compared to Sect. 2, we replace <~g with <p and take the decision literal 

from N instead of N UU. The ground instances of literals of U are a subset of 

the ground instances of literals of N so it is redundant to also consider U here. 


Conflict (T; U; T) Sodaia (I; U; (C; y)) 
if C €e (NUU), yis a grounding for C, and Cy is false under I’. 


Skip (T,K;U; (C;7)) >i (U; (C39) 
if comp(lit(K)) ¢ Cy. 


Factorize (I;U;(L'V LVC;4)) SNP ize (T; U; ((L V C)p;7)) 
if Ly = L'y and p is the IMGU for the terms MoL ) and atom( L’). 


Resolve (I;U;(LVC;7%c)) SAP ice (T; U; ((Cpc V Dpp); 7)) 
if =I" om: Dip) and Kyp = compl ye). pc and pp are renamings 
such that the variables of (LV C)pc and (K V D)pp are disjoint, u is the IMGU 
for the terms atom(L) pc and atom(K)pp, Yo and yh are adaptations of yc and 
yp to the renamings pc and pp respectively, and y is a merged grounding of 7 
for the variables of (L V C)pc and yhp for the variables of (K v D)pp. 

Note that the definition of merged grounding implies the following equalities: 
poy=7, Locy = Lyc, Cocy = Cyc, Kppy = Kyp, and Depy = Dyp. 

Compared to Sect.2, we explicitly rename the merged clauses to avoid 
variable-name clashes instead of assuming disjoint variables, and use an abstract 
specification for the merged grounding instead of forcing substitution composi- 
tion. The latter makes our rule more general by allowing more freedom to an 
implementation. 


Backtrack (I, I",K;U;(LV C;7)) + 2Gctrace (Fi {L V C}UU;T) 
if K = comp(Ly) and ($y. (L v C)7’ is ground and false under I’). 

Compared to Sect.2, we allow backtracking to any non-conflicting trail 
instead of specifying the position. This makes our rule more general by, again, 
allowing more freedom to an implementation. The minimally backtracking strat- 
egy introduced in Definition 4 brings back equivalence to the Backtrack rule of 
Sect. 2. 


Isabelle Technicalities. We define the SCL rules in the scl_fol_calculus 
locale. It fixes an abstract binary relation <p as a locale parameter and assumes 
that it bounds a finite number of atoms. It also fixes an abstract function to 
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generate variable renamings as a locale parameter and assumes its correctness; 
this function is not required for the specification of the calculus but is required 
in multiple proofs. Most of the following definitions and theorems are in the con- 
text of this locale. Each SCL rule is defined separately as an inductive predicate. 
Having separate definitions allows to refer to the rules individually in subse- 
quent definitions and theorems. Using inductive predicates, as opposed to plain 
definitions, is convenient because Isabelle automatically generates some useful 
introduction and elimination lemmas, and configures structured Isar syntax for 
case analysis. 

From the SCL rules, we can prove a number of invariants about states. Most 
of them are intuitive while few are technicalities of the Isabelle formalization. We 
will use the invariants as hypotheses for many of the main lemmas and theorems. 


Lemma 1 (scl_state_invariants). Let ([;U;C) be an state w.r.t. = oe 


The following invariants hold for the initial state (€; {}; T) and are each individ- 
ually preserved by the SCL rules. 


1. All annotated literals in I’ are ground. 
- VK € {lit(K)|K € set(T)}. K is a ground literal 
2. The atoms of all annotated literals in I are Xp (. 
- VE € {lit(K) |K € set(T)}. atom(K) <p B 
3. All annotated literals in I’ are undefined in their respective subtrail of I’. 
-WI'KI". T = I',K, I" — lit(K) is undefined in I’ 
4. All closures in I’ and C are ground. 
- VK € set( T). VD K y. K = (Ky)? — Dy is ground 
- YC y. C = (C; y) — Cy is ground 
5. All closures in l and C are false under their respective subtrail of T. 
— invariant 4. holds 
- YD Ky ' I". T = T'(Ky P,P" — Dy is false under I’ 
- YC y. C = (C37) — Cy is false under T 
6. All propagated literals in I are the grounding of the non-ground literal in 
their closure annotations. 
- YK € set( T). VD K y. ann(K) = (D; K; y) — lit(K) = Ky 
7. The complements of all propagated literals in I are absent from their closure 
annotation. 
- YK € set( T). YD K y. K = (Ky) 2) — comp(Ky) ¢ Dy 
8. All literals of the clauses in I’’s propagating clauses, U, and C have a cor- 
responding, more general literal in N. 
- YD € {D | (Ky) P” € set(T)}UUU (if C = (C; 7) then {C} else {}). 
YK e€ D. ID' € N. 3K' € D'. 3o. K'o = K 
9. All annotated literals in I have a corresponding more general literal either 
in N orinU. 
- YK € set( T`). IL € N UU. do. Lo = lit(K) 
10. All clauses in T, U, and C are entailed by N. 
- YK € set( T). YD K y. K = (KEP: — NE {KV D} 


- YC 7. € = (C37) — N E {C} 
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The SCL calculus is defined as a transition system where many decisions are 
deferred to strategies. A strategy specifies a transition system whose transitions 
are a subset of those from an existing transition system. We say that a strategy S 
restricts a transition system 7 (or symmetrically that T is restricted by S) if 
(Vay.Sxy —> T xy). Note that strategies can be chained to iteratively apply 
more restrictions. 

We define the reasonable and regular strategies restricting the = Ne relation 
in order to prove the main results of this paper. 


Definition 2. The reasonable strategy et cae restricts the SCL calculus by 
preventing decisions that immediately lead to a conflict. Such situations could be 
replaced by a propagation. Formally: 


S =F Sr Ss! — 8 = SA A(S =e S — (As”.s" ee 8”) 


Definition 3. The regular strategy een restricts the reasonable strategy 
by prioritizing the conflict rule to any other. Formally: 


N,B 1 N,B 1 aq" N,B 11 N,8 1 
S = Reg-SCL g = 8 = Rea-SCL SA (aS Te => Conflict S ) — S => Conflict S) 


While not required for the coming results, we also define the minimally back- 
tracking strategy to express the constraint on the backtracking position found 
in Sect. 2. 


Definition 4. The minimally backtracking strategy SNE pac s0L restricts the 
regular strategy by requiring that backtracking removes the shortest possible suffix 
of the trail. Formally: 


N,G NG N,B 
S => \Min-Bac-scL S “> S = Reg-SCL S'N (S > Backtrack S — 
trail(S’) is the longest prefix of traill S) 
not in conflict with the learned clause) 


All three strategies build on one-another and ultimately restrict the SCL 
relation. We can express this formally as implications, of which the first can be 
used to show that coming results (e.g., Corollaries 13 and 19) also hold for the 
minimally backtracking strategy. 


Lemma 5 (strategy_restrictions). The minimally backtracking strategy 
restricts the regular strategy, which restricts the reasonable strategy, which 
restricts the SCL calculus. Formally: 


-VNBSS'.S Min Bac-soL S — S 3 hese S 
-YNES S. S hen sor S — S K sor S 
-VNBSS'.S ee S — S =o Ss 
The bounding atom £ restricts the calculus to only consider the finitely many 


ground atoms less than or equal to 8 w.r.t. <g; this will play an important role in 
the termination proof. When SCL terminates, it either derived a contradiction, 
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or it found a model for the bounded groundings of the initial clauses. Because {3 is 
usually chosen heuristically, the model might be unsatisfactory for the considered 
use case and one may want to continue execution with a bigger bound. This is 
allowed if the new bound properly extends the previous bound 8 w.r.t. <B. 


Theorem 6 (monotonicity_wrt_bound). If the ground atoms bound by B are 
a subset of the ground atoms bound by B', formally if (VA. A is ground —> 
A <g B —A Xp p’), then the SCL, reasonable SCL, regular SCL, and 


minimally backtracking transitions w.r.t. B are also transitions w.r.t. 3’, formally 


-YNSS'.S Ssd, S — S >s S', 
“YNES S s 5S’ — S S Ra-SCL S', 


-YNSS'. S SR sor S — S >Re sor S, and 
N,B N,p' 
-VNS'S'.S = iin Bac-scL S —— S = win-Bac-sot S- 


Theorem 6 implies that all properties w.r.t. a bound 8 also hold w.r.t. a 
compatible bound 8’. Its hypothesis is fulfilled if <p is transitive on ground 
atoms, 8 and 8’ are ground atoms, and 3 <p (@’. The bounding atom could even 
be increased at any point in an SCL run, not just when the calculus terminated. 

The different rules and strategies considered so far express a single step of 
computation for the SCL calculus; they offer a good level of granularity to both 
understand and mechanize the details of the calculus. But many results of the 
following sections ought to express properties of the calculus as a whole. We 
express such results in terms of a run from the initial state. A run is the reflexive, 
transitive closure of a rule or strategy, e.g. S (>38 )* S’ is an SCL run from the 
state S to the state S’. 

The soundness of the individual SCL rules is shown by invariant 10. We now 
consider the soundness of terminating runs of the SCL calculus as a whole. 


Theorem 7 (correct_termination). Let S = (I;U;C) be a state w.r.t. 
= Ne If invariants 2, 3, 5, 6 and 10 hold for S, and if S is a stuck state 
with some restrictions, formally if 
- AS". S 2 cate S 
S'S Decide S'N ASS > Conpiicr 5”); 
N, 
S's ae S, 
N, 
S.S SYNE S, 
IEE E 
As’. S SD track S' and the backtracking is minimal, 


w w w HL 


then either the conflicting clause L has been derived and the groundings gnd(N) 
of the initial clauses N are unsatisfiable, or there is no conflicting clause and 
the groundings gnd=#*(N) of the initial clauses N are satisfiable by the trail, 
formally either 


- (Ay. € =(1;9)) A AL. I En gnd(N)), or 
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~C=T A HI(L) Hu gnd? (N). 


Note that no hypothesis restricts the usage of the Factorize rule because it 
is an optional step of conflict resolution that has no impact on satisfiability. 

Theorem 7 holds for a family of strategies, in contrast to Theorem 5 from 
Bromberget et al., which was only shown for what is here called the minimally 
backtracking strategy. This family of strategies contains any strategy that pre- 
serves the required invariants and is restricted by the minimally backtracking 
strategy. From Lemma 5 we know that these two requirements are fulfilled by 
the SCL relation but also by the reasonable, regular, and minimally backtracking 
strategies. This leads to a more intuitive corollary based on runs. 


Corollary 8 (correct_termination_strategies). If an SCL, reasonable 
SCL, regular SCL, or minimally backtracking SCL run starting from the ini- 
tial state (e;{}; T) terminates in a state S = (T; U; C), formally any of 

- (657) gap) S A AS. S = Ser, S’), 

- (e; {}; T) (SP on SA ERE S oe soi S’), 

~ (5 {} T) reg ser)” S A AS'S => Reg-sor S’) or 

> (€; {}; I) eee SA As. S Sn Busses Ss"), 


then the conclusion of Theorem 7 holds. 


Note that each strategy is used with positive polarity in the “run” hypothesis 
and negative polarity in the “no-more-step” hypothesis. For this reason, it is 
impossible to provide a corollary with a single requirement to restrict or be 
restricted by any known strategy. 

Traditional saturation-based calculi for first-order logic, e.g. Resolution and 
Superposition, can learn redundant clauses and thus their implementations 
require costly checks for non-redundancy. SCL(FOL) learns only non-redundant 
clauses. Thus, an implementation would not need to check for (forward) non- 
redundancy. We first repeat the definition of standard redundancy as found 
in [18]. 


Definition 9. A clause C is redundant w.r.t. a set of clauses N and a strict 
order on clauses < if (VC’ € gnd(C). {D' € gnd(N) | D! < C’} Eg C”). 


We first prove non-redundancy w.r.t. a trail-induced dynamic order and then 
lift this result to non-redundancy w.r.t. a static order. 


Definition 10. A trail I induces a well-founded, strict partial order <, total 
on all atoms in I’’s literals. Assuming I has the form L*,...,£5,L3,L6 for all 
x € {{,(D,yp) for some D and yp}, we have the following ordering. 


atom(Ln) <? -<T atom(L2) x" atom(L,) x" atom(Lo) 


In other words, “older” elements on the left are smaller than “newer” elements 
on the right. Formally: 


ty <? tp — (i < |I|. Ij < i. tı = atom(lit(Tli])) A tz = atom(lit(I[j]))) 
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Compared to Bromberger et al., the trail-induced order is defined on atoms 
instead of literals and non-redundancy is proven for any lifting to literals. 


Theorem 11 (dynamic_non_redundancy_regular_scl). Following conflict 
resolution in a regular run, formally if 


- (sih T) (> Rig sor)“ W; U; T), 
- (T; U; T) Orei Si, 
-s (Se Factori Rehal Sn, and 
- Sn = Buokirack St+n; 
then neither is the learned clause C = conflict(S,,) generalized by any initial or 
learned clause, formally (AD € N UU. 3o. Do = C), nor is it redundant w.r.t. 


NUU and the order we get by first lifting the trail-induced order <? from atoms 
to literals and then taking its multiset extension. 


Dynamic non-redundancy with respect to the trail-induced order does not 
by itself release an implementation from performing backward non-redundancy 
checks, but it is a strong guarantee on the quality of learned clauses. For back- 
ward redundancy checks an order needs to be used that encompasses all dynamic 
trail-induced orders. An order based on a strict multiset relation has this prop- 
erty. So for backward redundancy we can, e.g., delete subsumed clauses. 


Corollary 12 (static_non_subsumption_regular_scl). If a regular run 
starting from the initial state (e;{};T) learns a clause C, formally if 

~ (6 {}; T) (Rep-sor)* (T; U; (C37) and 
~ (I; U; (C; 7) Sis S, 


then C is not subsumed by any of the initial or learned clauses, formally 
İD e€ NUVU. Jo. Do CC. 


All non-redundancy results can be generalized to an arbitrary strategy restricting 
the regular strategy. We only show one example here and refer the reader to the 
formalization for the others. 


Corollary 13 (dynamic_non_redundancy_strategy). Following conflict res- 
olution in the run of a strategy restricting regular SCL, formally if 


- (6 {}; T) (= Straten)” (T; U; T); 
z (I; U; T) Sa Si, 


N,6 + 
are) (= Skip, Factorize, Resolve) Sn; 
N,68 
- Sn > Backtrack, Siin, and Pe 
1 , 1 ; 1 
- VSS. S > Strategy Ê — S >Re so, > 


then neither is the learned clause generalized by any initial or learned clause, 
formally (AD € N U U. Jo. Do = C), nor is it redundant w.r.t. NUU and the 
order we get by first lifting the trail-induced order <? from atoms to literals and 
then taking its multiset extension. 
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During the development of this formalization, we discovered that the original 
Backtrack rule found in [6] allows to learn a duplicate of the last learned clause, 
which violates the stated non-redundancy of learned clauses. The original Back- 
track rule ensures that the conflict closure is not false under the new trail, but 
the learned clause could still be in conflict w.r.t. another grounding. Following 
this conflict, the Backtrack rules would be immediately applicable and would 
learn the same clause again. This could only happen a finite number of times as 
backtracking reduces the length of the (finite) trail. As an example, consider the 
set of clauses N = {P(x), Q(y), -Q(z) V R(z), -R(w) v S(w), aP(v) V7S(v)}, and 
a big enough ø. The following SCL run was valid with the original Backtrack 
rule. Note that the notation for the trail was shortened to save space. 


(s {35 T) 
Nfa) — (P@,Q@), PO, O; {};T) 
(al ee) (P0), Qa), PO, QO, RO COTO: ge Cer ee tet) 
eect PO, Qa), PO, QO, RO POMP), g SARA wb) 1}; (P(w) vS (v); v—=b)) 
Ree werskip (P@),Q@), PO), QW, RO EOT: = 1}; (4P(v) V =R); v—=b)) 
= Reo we+skip (P@),Q@), PO, QO; {}; (AP) V -Q@);v = b)) 
SBikra — (P(@),Q(@), PO); {=PO V -Q@)}; T) 
= Soe aict skip (P@,Q@); {-P@) V 7Q@)}; (APO) V AQ); va) 
= Boctrack — (P(a); {>P() V =Q@)}; T) 


This counterexample was only discovered when we failed to prove Theorem 
11 in Isabelle. Note that this formalization is based on and was developed simul- 
taneously to Bromberger et al., which originally inherited the Backtrack rule 
from [10]. The solution, which was promptly integrated into this formalization 
and Bromberger et al., is for the Backtrack rule to find a position without con- 
flict w.r.t. the learned clause. Note that the original Backtrack rule reaches such 
a state after having learned the same clause finitely often, which has no effect 
on the set of learned clauses because sets ignore duplicates. Thus, the original 
Backtrack rule did not invalidate the other properties of the SCL calculus. This 
discovery is strong evidence of the usefulness of mechanized formalization for 
both published work and ongoing research: the Isabelle formalization lead to the 
discovery of a previously unknown bug and it guided the development of the 
refinement. 

A calculus expressed as a state machine terminates if the transition relation 
starting from the initial state is well-founded following the arrow direction. We 
prove well-foundedness of regular SCL in three steps: (1) we first prove well- 
foundedness of SCL without backtracking, denoted a oBacki (2) we then 
prove that a regular run can only learn finitely many clauses; and (3) from these 
two results we finally prove well-foundedness of regular SCL. Step 1 is novel to 
the formalization. Prior work in Bromberger et al. focuses exclusively on the 
Backtrack rule (step 2) in order to prove termination of regular SCL (step 3). 
Also novel to the formalization are decreasing measuring functions for steps 1 
and 2. 
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Definition 14. The measuring function M3(N,G,S) for SCL without back- 
tracking maps a set of initial clauses N, a bounding atom B, and a state S 
to a 4-tuple. The tuple elements are (1) a boolean identifying whether the state 
is conflict-free, (2) a (finite) set overapproximating the literals that could be 
added to the trail, (3) a (finite) list overapproximating the numbers of resolution 
steps that could be performed at each position in the trail, and (4) the (finite) 
cardinality of the conflicting clause. Formally: 


Mı(ß, T) = {L| atom(L) <p B} — {lit(K)|K € set(L)} 
Male, C) =€ 
M2((L\ Kk), C) = Ma(T, C),0 
Mo((T, (K7) 2:9), C) = let n = count(C, comp(Ky)) in 
Ma(L,C V repeat(n, Dy)),n 
Ms(N, 8, (T; U; (C;7))) = (False {5 Wie D ICI) 
With this, we can prove termination of SCL without backtracking (step 1). 


Theorem 15 (termination_scl_without_back). SCL without backtracking 
is well-founded on all states reachable by an SCL-without- pana OiaTG run start- 
ing from the initial state, formally on {S| (6 {};T) (338. opid Od: 


We now turn to proving termination of regular SCL with backtracking by first 
defining an appropriate measuring function. 


Definition 16. The measuring function M4(8, S) for the rule Backtrack maps 
a bounding atom B and a state S to a finite set of clauses without duplicates. It 
computes an over-approximation of the set of clauses that could still be learned 
modulo duplicates. Formally: 


Ma4(B,S) = 2h latom(L)=28} _ £ set(C)| C € gnd(learned(S))} 


We then prove that it decreases every time we learn a new clause (step 2). 


Lemma 17 (M_back_after_regular_backtrack). Following conflict resolu- 
tion in a regular run, formally if 


= (ene T) (> Rig: set)” (T; U; T), 
= (T; U; T) Se ice Si, 


N,6 3 

A St (= Ship, Factorize, Resolve) Sn, and 
N,6 

= Sn = Backtrack Siin; then 


1. the ground conflict is distinct from all groundings of initial and learned 
clauses modulo duplicates, formally (AC y. conflict(S,) = (C; y) A set(Cy) ¢ 
{set(D) | D € gnd(N UU)}), and 
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2. the set of clauses that could potentially be learned strictly diminishes, formally 
Mal(B, S14n) E Ma(8, Sn). 


Lemma 17 is novel to the formalization. Together with Theorem 15 it allows us 
to prove termination of regular SCL with backtracking (step 3). 


Theorem 18 (termination_regular_scl). Regular SCL is well-founded on 
all states reachable ee a regular-SCL run starting from the initial state, formally 


on {S| (6 {}; T) (> R%-scn)* S} 


All termination results can be generalized to an arbitrary strategy restricting 
the regular strategy. We only show one example here and refer the reader to the 
formalization for the others. 


Corollary 19 ca alia If a strategy restricts regular SCL, 
formally if (VS S'S > 3p regy S — S Ro scr S’), then it is well-founded 
on all states reachable by a run nani this strategy and starting from the initial 


state, formally on {S| (6; {}; T) (a S}. 


All theorems until now were first expressed and proven using invariants and 
then the versions expressed using runs were derived. However, Theorem 18 posed 
an interesting problem because its proof requires the backtracking step to have 
knowledge of the trail when a conflict last occurred. But this information is 
lost in the SCL state due to the Skip rule shrinking the trail. We did define an 
invariant that expresses the historical form of the trail and its properties derived 
from the regular strategy, but it is complex and the added value compared to 
working directly on a regular run is questionable. For simplicity, we chose not 
to present this invariant in this paper. 

Together, soundness and termination allow us to prove refutational complete- 
ness of the regular SCL calculus w.r.t. a fixed bound. 


Theorem 20 (completeness_wrt_bound). If the groundings gnd=®°(N) of 
the initial clauses N are unsatisfiable, then all regular SCL runs starting from 
the initial state terminate and derive the conflicting clause L, formally 


: there is no infinite ou run starting from the initial state, and 
ee (>m sor)" SA ASS RE scx, S) — (Ar. conflict(S) = 
t;7))). 


Theorem 20 is only defined w.r.t. a bound, but fortunately we can prove that 
there must always exist an appropriate bound. 


Lemma 21 (ex_bound_if_unsat). If the relation <p is a well-founded, strict 
order, total on ground atoms and the groundings gnd( N) of the initial clauses N 
are unsatisfiable, then there exists a bound B such that the groundings gnd=#?(N) 
are unsatisfiable. 


Note that while Lemma 21 proves the existence of an appropriate bound, it 
provides no constructive way of finding one. What one can do is follow along The- 
orem 6 and iteratively increase a heuristically chosen bound until an appropriate 
one is found; if the set of initial clauses is unsatisfiable, this will terminate. 
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Isabelle Technicalities. Lemma 21’s hypothesis that <g is a well-founded, 
total, strict order cannot be expressed as a theorem-local hypothesis. The rea- 
son is that the compactness theorem for clausal first-order logic requires terms 
to be an instance of the wellorder type class, which is not the case in the 
scl_fol_calculus locale, where the assumptions on the <p relation are kept 
minimal. Because Isabelle does not allow to instantiate a type class with a 
concrete type inside a locale or theorem, we define a new locale that extends 
scl_fol_calculus and adds a type class requirement on the first-order term 
constants. This enables the type-class system to automatically instantiate the 
wellorder type class for terms using the previously registered Knuth-Bendix 
order. We then instantiate the <g relation of scl_fol_calculus with the 
Knuth-Bendix order. This type class and locale gymnastic could be avoided if 
the formalization of the compactness theorem was refactored to offer a predicate- 
based version alongside the existing type-class-based version. 


4 Conclusion 


We generalized and formalized the SCL(FOL) calculus in Isabelle/HOL. The 
main results are formal proofs of soundness, non-redundancy of learned clauses, 
termination, and refutational completeness. Because the formalization was per- 
formed simultaneously to Bromberger et al., they could benefit from each other. 
A mechanized formalization must consider low-level details, but it is also the 
opportunity to identify the most import aspects of the theory and abstract over 
details needed in the context of an actual implementation. For example, we 
abstracted from the level of a state to define the Backtracking rule and replaced 
it with an abstract specification of the result. A level was used in all pen-an- 
paper presentations of the calculus in order to have a constructive way of going 
back to the maximal trail where the learned clause propagates. The abstraction 
supports investigation of several Backtrack rule versions and to base the sound- 
ness result on a version with a minimal requirement, i.e., the learned clause is 
no longer false with respect to the trail. 

The formalization did uncover a small bug in the calculus, but also showed 
that its effect was very localized and naturally lead to a solution. Another ben- 
efit of the formalization is how much it supports refactoring and exploratory 
experimentation. When making a change to a definition or a conjecture, Isabelle 
immediately and exhaustively points to the parts that need to be adapted. Very 
often, proofs can automatically be adapted using proof automation tools such 
as Sledgehammer. This was invaluable to quickly try out ideas or change subtle 
parts of the calculus. One such example is in the Resolve rule, where the formal- 
ization first used substitution composition as found in the original calculus and 
latter replaced it by an abstract specification of merged grounding. This idea 
came from a private discussion sketching an eventual C implementation where 
it became clear that substitution composition would be a costly operation. We 
then introduced the abstract specification of merged grounding and fixed the 
formalization by following the mistakes reported by Isabelle. 
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Abstract. We show that SCL(FOL) can simulate the derivation of non- 
redundant clauses by superposition for first-order logic without equal- 
ity. Superposition-based reasoning is performed with respect to a fixed 
reduction ordering. The completeness proof of superposition relies on the 
grounding of the clause set. It builds a ground partial model according 
to the fixed ordering, where minimal false ground instances of clauses 
then trigger non-redundant superposition inferences. We define a respec- 
tive strategy for the SCL calculus such that clauses learned by SCL and 
superposition inferences coincide. From this perspective the SCL calculus 
can be viewed as a generalization of the superposition calculus. 


Keywords: first-order reasoning - superposition - SCL - 
non-redundant clause learning 


1 Introduction 


Superposition [1,2,18] is currently considered as the prime calculus for first- 
order logic reasoning where all leading first-order theorem provers implement a 
variant thereof [14,16,20,22]. More recently, the family of SCL calculi (Clause 
Learning from Simple Models, or just Simple Clause Learning) [4,8,9,11,17] 
was introduced. There are first experimental results [3] available, and first steps 
towards an overall implementation [5,7]. 

The main differences between superposition and SCL for first-order logic with- 
out equality are: (i) superposition assumes a fixed ordering on literals whereas 
the ordering in SCL is dynamic and evolves out of the satisfiability of clauses, 
(ii) superposition performs single superposition left and factoring inferences 
whereas SCL typically performs several such inferences to derive a single learned 
clause, (iii) the superposition model operator is not effective on the non-ground 
clause level whereas the SCL model assumption is effective. For first-order logic 
without equality superposition reduces to ordered resolution combined with the 
powerful superposition redundancy criterion. Our simulation result cannot be 
one-to-one because an SCL learned clause is typically generated by several super- 
position inferences and superposition factoring inferences are performed by SCL 
only in the context of resolution inferences. The simulation result considers the 
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ground case, where the superposition strategy used in the completeness proof only 
triggers non-redundant inferences [1]. We call this strategy SUP-MO, Definition 5. 
Overall first-order superposition completeness is then obtained by a lifting argu- 
ment to the non-ground clause level. We actually show that a superposition refu- 
tation of some ground clause set can be simulated by an SCL refutation on the 
same clause set, such that they coincide on all superposition left (ordered resolu- 
tion) inferences. For the superposition calculus we refer to [1] and for SCL to [9] 
where all main properties of both calculi have meanwhile been verified inside the 
Isabelle framework [10,19,21]. 

For example, consider a superposition refutation of the simple ground clause 
set 

N8up = {(C1) Pla) V P(a), (C2) =P(a) V Q®), (C3) -Q(b)} 


with respect to a KBO [13], where all symbols have weight one, and prece- 
dence a < b < P < Q. Superposition generates only non-redundant clauses. 
Then with respect to the usual superposition ordering extension to literals and 
clauses we get (C1) <Kpo-sup (C2) <Kpo-sup (C3) and the superposition model 
operator produces the Herbrand model N? „z = 0. Now clause (C1) is the min- 
imal false clause, triggering a factoring inference resulting in (C4) P(a) and 
clause set Ngyp = N&yp U {(C4) P(a)}. The clause P(a) cannot be derived 
by SCL because factoring is only preformed in the context of resolution infer- 
ences. Now (C4) is the smallest clause in Ngyp and the superposition model 
operator produces Nł „z = {P(a),Q(b)} with minimal false clause (C3). A 
superposition left inference between (C3) and (C2) generates (C5) 4P(a) and 
Ngyp = Negyp U {(Cs) >P(a)}. The generation of —P(a) can now be sim- 
ulated by SCL by constructing the SCL trail [P(a)'Q(b){P@Y2O] out of 
N§up = Nc, leading to the learned clause (C5) —P(a) and respective clause 
set Nor = N§oy U {(Cs) =P(a)}. Note that P(a) could have also been prop- 
agated, see Sect. 2 rule Propagate, but this would eventually not lead to the 
learned clause (C3) =P(a) but L. Finally, the superposition model operator pro- 
duces N? , = {P(a), Q(b)} with minimal false clause (Cs) and infers L. The 


SUP; 
SCL simulation generates the trail [P(a){?(”}] and then learns | as well out 


of a conflict with (Cs). Note that this SCL trail is based on a factoring of (C1) 
to P(a) that was the explicit first step of the superposition refutation. Recall 
that by using an exhaustive propagation strategy, SCL would start with the 
trail [P(a)PQ(b){-PY2)}) and immediately derive 1. Exhaustive propa- 
gation is not a good strategy in general, because first-order logic clauses may 
enable infinitely many propagations. Even together with the typical SCL restric- 
tion to finitely many ground instances, there are exponentially many propaga- 
tions possible, in general. Therefore, the regular strategy defined in [9] does not 
require exhaustive propagation, but guarantees non-redundant clause learning. 
The SCL-SUP strategy, Definition 8, and Definition 10, simulating superposition 
SUP-MO runs is also a regular strategy, Lemma 17. 

The paper is now organized as follows. After repetition of the needed concepts 
of SCL and superposition, Sect. 2, the simulation result is contained in Sect. 3. 
We show that any superposition refutation of a ground clause set producing only 
non-redundant inferences through the SUP-MO strategy, can be simulated via 
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the SCL-SUP strategy. Based on the 14 simulation invariants of Definition 7, 
we show the invariants by an inductive argument on the length of the super- 
position refutation, starting from the initial state, Lemma 13, for intermediate 
superposition inference steps Lemma 14, until the final refutation Lemma 15, 
and Lemma 16. For the simulation we do not consider selection in superposition 
inferences in favor of a less complicated presentation. The paper ends with a 
discussion of the obtained results. A full version of the paper including all proofs 
is available on arxiv [6]. 


2 Preliminaries 


We assume a first-order language without equality where N denotes a clause 
set; C,D denote clauses; L, kK, H denote literals; A,B denote atoms; P,Q, R 
denote predicates; t, s terms; f, g, h function symbols; a, b, c constants; and x, y, z 
variables. Atoms, literals, clauses and clause sets are considered as usual, where 
in particular clauses are identified both with their disjunction and multiset of 
literals [9]. The complement of a literal is denoted by the function comp. The 
function atom(L) denotes the atomic part of a literal. Semantic entailment — 
is defined as usual where variables in clauses are assumed to be universally 
quantified. Substitutions 0,7 are total mappings from variables to terms, where 
dom(c) := {x | xo # x} is finite and codom(c) := {t | xo = t,x € dom(a)}. 
Their application is extended to literals, clauses, and sets of such objects in the 
usual way. A term, atom, clause, or a set of these objects is ground if it does 
not contain any variable. A substitution ø is ground if codom(c) is ground. A 
substitution ø is grounding for a term t, literal L, clause C if to, Lo, Co is 
ground, respectively. The function mgu denotes the most general unifier of two 
terms, atoms, literals. We assume that any mgu of two terms or literals does not 
introduce any fresh variables and is idempotent. A closure is denoted as C- o 
and is a pair of a clause C and a substitution ø that is grounding for C. The 
function ground returns the set of all ground instances of a literal, clause, or 
clause set with respect to the signature of the respective clause set. 

A (partial) model M for a clause set N is a satisfiable set of ground literals. 
A ground clause C is true in M, denoted M } C, if CAM # 9, and false 
otherwise. A ground clause set N is true in M, denoted M } N if all clauses 
from N are true in M. A (partial) Herbrand model I for a clause set N is a set 
of ground atoms. A ground clause C is true in J, denoted I Ex C, if there is 
an atom A € C such that A € J, or there is a negative literal ~A € C such that 
A ¢ I, and false otherwise. A ground clause set N entails a ground clause C, 
denoted N } C, if M } C implies M H {C} for all models M. 

We identify sets and sequences whenever appropriate. However, the trail of 
an SCL run is always a sequence of ground literals. 

Let < denote a well-founded, total, strict ordering on ground literals. This 
ordering is then lifted to clauses and clause sets by its respective multiset exten- 
sion. We overload <x for literals, clauses, clause sets if the meaning is clear from 
the context. The ordering is lifted to the non-ground case via instantiation: we 
define C < D if for all grounding substitutions ø it holds Co < Do. We define 
< as the reflexive closure of < and N=° := {D | D € N and D < O}. 
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Definition 1 (Clause Redundancy). A ground clause C is redundant with 
respect to a ground clause set N and an order < if NZ? EC. A clause C is 
redundant with respect to a clause set N and an order < if for all C" € ground(C) 
it holds that C” is redundant with respect to ground(N). 


Let <g denote a well-founded, total, strict ordering on ground atoms such 
that for any ground atom A there are only finitely many ground atoms B with 
B <p A. For example, an instance of such an ordering could be KBO without 
zero-weight symbols. (Note that LPO does not satisfy the last condition of a <p 
ordering although it is a well-founded, total, strict ordering.) The ordering <p 
is lifted to literals by comparing the respective atoms and if the atoms of two 
literals are the same, then the negative version of the literal is larger than the 
positive version. It is lifted to clauses by a multiset extension. 


The SCL(FOL) Calculus: The inference rules of SCL(FOL) [9] are represented 
by an abstract rewrite system. They operate on a problem state, a six-tuple 
(T; N; U; 6;k; D) where T is a sequence of annotated ground literals, the trail; 
N and U are the sets of initial and learned clauses; ( is a ground literal limiting 
the size of the trail; k counts the number of decisions; and D is either T, L 
or a clause closure C -ø such that Co is ground and false in J’. Literals in T 
are either annotated with a number, also called a level; i.e., they have the form 
L! meaning that L is the k-th guessed decision literal, or they are annotated 
with a closure that propagated the literal to become true. A ground literal L 
is of level i with respect to a problem state (I; N; U; 6;k; D) if L or comp(L) 
occurs in I’ and the first decision literal left from L (comp(Z)) in I’, including 
L, is annotated with i. If there is no such decision literal then its level is zero. A 
ground clause D is of level i with respect to a problem state (I; N; U; 3; k; D) 
if 7 is the maximal level of a literal in D. The level of the empty clause L is 0. 
Recall D is a non-empty closure or T or L. Similarly, a trail I” is of level 7 if the 
maximal literal in I’ is of level i. 

A literal/atom L/A is undefined in T if neither L/A nor comp(L)/comp(A) 
occur in I". The start state of SCL is (e€; N; Ø; 6;0; T) for some initial clause set 
N and bound 8. The below rules are exactly the rules from [9] and serve as a 
reference for our simulation proof in Sect. 3. 

Propagate (I';N;U;8;k;T) scr (I, Lo(v))*; N, U; B; k; T) 

provided C V L € (N U U), C= Co VG, Cio = Lo V --- V Lo, Coo does 
not contain Lo, 6 is the mgu of the literals in Cı and L, (C V L)ø is ground, 
(C V L)o <g {8}, Coo is false under I’, and Lo is undefined in I. 

Decide (P;N;U;6;k;T) =>scu (T, Lo*; N; U; Bk +1;T) 

provided atom(L) occurs C for a C € (N UU), Lo is a ground literal undefined 
in I’, and Lo <g p. 

Conflict (r; N;U;8;k; T) =>scu (2;.N;U;6;k; D-o) 

provided D € (N U U), Do false in I for a grounding substitution o. 

Skip (T, L;N;U;8;k;D-o) >sct (T; N;U;p;k— i; D-o) 

provided comp(L) does not occur in Do, if L is a decision literal then i = 1, 
otherwise i = 0. 
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Factorize (I';N;U;8;k;(DVLVL')-o) =>soL (T; N; U; p; k; (DV Lìn: o) 
provided Lo = L'o, n = mgu(L, L’). 
Resolve (T, L5(CV)8; N; U; B; k; (D V L/) -o) 

=sor (I, LACOYH):3; N; U; B; k; (D V C)n 08) 
provided L = comp(L’c), 7 = mgu(L, comp(L’)). 
Backtrack (Ip, K, T1, comp(Lo)*; N; U; 6; k; (D V L) - o) 

=>scL (To; N; U U {DV L}; b; j; T) 
provided Da is of level i’ < k, and Tọ, K is the minimal trail subsequence such 
that there is a grounding substitution 7 with (D V L)r is false in Ip, K but not 
in Ib, and Ip is of level j. 

A sequence of rule applications of a particular calculus is called a run of the 

calculus. A strategy for a calculus restricts the set of runs we actually allow by 
imposing further conditions on the allowed rule applications. 


Definition 2 (SCL Runs). A sequence of SCL rule applications is called a 
reasonable run if the rule Decide does not enable an immediate application of 
rule Conflict. A sequence of SCL rule applications is called a regular run if it is 
a reasonable run and the rule Conflict has precedence over all other rules. 


All regular SCL runs are sound, only derive non-redundant clauses, always 
terminate, and SCL with a regular strategy is refutationally complete (for first- 
order logic without equality) [9]. 


The Superposition Calculus: Superposition [1,2,18] is a calculus for first-order 
logic reasoning that also infers/learns new clauses like SCL. In contrast to SCL, 
it does these inferences based on a static ordering < and, at the level of infer- 
ence rules, independent of a partial model. A permissible ordering < for the 
superposition calculus is always a well-founded, total, strict ordering on ground 
literals. This ordering is then lifted to clauses and clause sets by its respective 
multiset extension. A problem state in the superposition calculus is just a set N 
of clauses. The start state the initial clause set. Due to the restriction to first- 
order logic without equality, the most basic version of the superposition calculus 
consists just of the following two rules (without selection): 


Superposition Left (NW{CLV P(ti,...,tn), CoV>P(s1,...,5n)}) sup 
(N U {C Vv P(t, cee bn) Co V =P(s1, essy Sn)} U {(C1 Vv C2)o}) 
where (i) P(t1,...,tn)o is strictly maximal in (C1 V P(ti,...,tn))o 


(ii) =P(s1,...,5,)o is maximal, (iii) ø is the mgu of P(ti,...,t,) and 
P(s1,.--,8n)- 
Factoring (NW{CYV P(t,...,tn) V P(s1,.--,5n)}) =>suP 


(N U{CV P(ti,... tn) V P(s1,---,8n)}UL{(C V P(ti,..-,tn))o}) 
where (i) P(t1,...,tn)o is maximal in (C V P(ti,...,tn) V P(s1,..-,8n))o 
(ii) ø is the mgu of P(ti,...,t,) and P(s1,..., 8). 
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Let sfac(C) represent a clause obtained by exhaustively applying superposi- 
tion Factoring on C. Recall, that superposition Factoring only applies to maximal 
positive literals. Let sfac( N) represent the clause set N after every clause has 
been exhaustively factorized by Superposition Factorization. 

Although the superposition calculus itself is independent of a partial model 
and may learn non-redundant clauses, the completeness proof of superposition 
in [1] is based on a strategy that builds ground partial models according to 
the fixed ordering <, where minimal false ground instances of clauses then trig- 
ger non-redundant superposition inferences. Note that the completeness proof 
relies on a grounding of the clause set that may lead to infinitely many clauses. 
However, the strategy from the completeness proof can also be seen as a super- 
position strategy for an initial clause set, where all clauses are already ground. 
On ground, finite clause sets, superposition restricted to the strategy only infers 
non-redundant clauses, always terminates, and is complete. The partial model 
needed in each step of the strategy is constructed according to the following 
model operator: 


Definition 3 (Superposition Model Operator). Let N be a set of ground 
clauses. Then Nr is the Herbrand model according to the superposition model 
operator for clause set N and it is constructed recursively over the partial Her- 
brand models Nc for all C E€ N: 


No = Upc 4p Nr =Ucen $c 
5n {B} if D= D'V B, B strictly maximal, Np kx D 
D 0 otherwise 


We say that a clause C is productive (wrt. the model construction of a clause 
set N) if do £0. We say that a clause C produces an atom B (wrt. the model 
construction of a clause set N) if dc = {B}. 


After constructing the model Ny; for a clause set N, the strategy selects 
the smallest clause in N that is false in Nz. The strategy then selects a fitting 
inference rule based on the reason why the clause is false in N;. The newly 
inferred clause either changes the model in the next step or changes the smallest 
clause that is false. This is the strategy used in the superposition completeness 
proof [1]. 


Definition 4 (Minimal False Clause). The minimal false clause C € N is 
the smallest clause in N according to < such that No U ðc Ey C. 


Definition 5 (Superposition Model-Operator Strategy: SUP-MO). 
The superposition model-operator strategy is defined over the minimal false 
clause with regards to the current clause set N. The strategy can encounter the 
following cases: 


(1) N has no minimal false clause. Then N is satisfied by Nr and we can stop 
the superposition run. 

(2) The minimal false clause in N is L. Then N is unsatisfiable, which means 
we can also stop the superposition run. 
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(3) C is the minimal false clause in N, and it has a maximal literal L that 
is negative. Then there must be a clause D € N with D < C, a strictly 
maximal literal comp(L), and dp = {comp(L)}. In this case, the strategy 
applies as its next step Superposition Left to C and D. 

(4) C is the minimal false clause in N, and it has a maximal literal L that 
is positive. Then L is not strictly maximal in C and the strategy applies 
Factoring to C. 


The first two cases of the SUP-MO strategy also describe its final states 
according to [1]. In all other states there is always exactly one rule applicable 
according to the SUP-MO strategy, which also means that SUP-MO is never 
stuck. 


Lemma 6 (SUP-MO Applicability). Let N be a set of ground clauses. If N 
has a minimal false clause C # L, then there exists exactly one rule applicable 
to N according to the SUP-MO strategy. 


3 SCL Simulates Superposition 


In general, it is not possible to simulate all inferences of the superposition cal- 
culus with SCL because SCL only learns/infers non-redundant clauses, whereas 
syntactic superposition inferences have no such guarantees. Moreover, the infer- 
ences by SCL are all based on conflicts according to a partial model driven by 
the satisfiability of clause instances, whereas the inferences by superposition are 
based on a static ordering <. We can mitigate these differences by restricting 
superposition with the SUP-MO strategy because SUP-MO has non-redundancy 
guarantees and it infers new clauses based on minimal false clauses with respect 
to a ground partial model. 

Let N° be a set of ground clauses, totally ordered by a superposition 
reduction ordering <. Let N’ (for i > 0) be the result of i steps of the 
superposition calculus applied to N° according to the SUP-MO strategy, i.e., 
N? => SUP-MO N! = SUP-MO --- =SUP-MO Ni. Again, all NŻ are sets of ground 
clauses, totally ordered by a superposition reduction ordering <. The SCL strat- 
egy SCL-SUP that simulates superposition restricted to SUP-MO runs is defined 
inductively on the clause ordering <. To guide and to prove the correctness of our 
simulation, we assign to each SCL state and every clause some additional infor- 
mation. For this purpose, every SCL state is annotated with a triple (i, C, y), 
where 2 is an integer that states that the SCL state simulates the superposition 
state NŻ, C is the last clause that was used as a decision aid by the strategy, y 
is a function such that y(C) = sfac(C) if sfac(C) € N* and (C) = C otherwise, 
the SCL state also simulates the model construction for N* upto N%,Udc, where 
C’ = 7(C). The annotated states are written (T; N°; U; 3; k; E)(i,c,7). The over- 
all start state is then (e; N°; 0; 3; 0; T)(o,1,7)> where we assume (3 large enough 
so A <g 6 for all A € atom(N°), L ¢ N°, and (C) = sfac(C) if sfac(C) € N° 
and (C) = C otherwise. We will later see that the annotated integer is not 
relevant for the actual choice of SCL rules by the SCL-SUP strategy but only 
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to prove that the strategy actually simulates superposition. Moreover, we define 
a new ordering <}, based on our superposition ordering < and function y such 
that C <, D if y(C) < y(D). 


Definition 7 (State Simulation). Let (T; N°; U; 3;k; E)¢,p,y) be an SCL 
state for the input clauses N°. Let L be the maximal literal in D if D 4 L and 
the minimal literal according to < otherwise. Let N° >sup-mo N! >sup-mo 
... =>sup-mo N° be the superposition run following the SUP-MO strategy start- 
ing from the input clause set N°. Let D! = y(D). Then we say that the SCL 
state (T; N°; U; 6; k; E)@,p,y) simulates N?! and the model construction upto 
Ny Udp: if 


(i) atom(N°) = atom(N*) = atom(N° UU), A <g B for all A € atom(N°), 
and De {L}UN°UU 
(ii) sfac(N° U U) C sfac(N*) and (C) € N° for all C € N°UU and (C) = 
sfac(C) or y(C) = C. 
(iii) for all C € NĊ there exists a C' € N? UU U {E} such that sfac(C) = 
sfac(C’) if the maximal literal in C is positive 
(iv) for all C € N* there exists a C' € N° UU U {E} such that C! = C and 
(C) C 
(v) for all atoms A occurring in N°: A € Np Udp iff AET 
(vi) for all atoms A: =A ET iff AX L and Ag Np 
(vii) for every literal L in I, i.e, IT = I',L,I”", and all literals L’ in I’, 
atom(L’) < atom(L) 
(viii) for every atom (= positive literal) B in I’, i.e., IT =I", B, I”, there exists 
C e N? UU and a C' € N' such that (C) = sfac(C) = sfac(C’) = C’, 
and C" produces B, i.e., dcr = {B} 
(ix) for every clause C € N* with C < y(D) that produces an atom B, i.e., 
dc = {B}, there exists C! € N°UU such that C = ¥(C’) and C , D. 
(x) T contains only decisions if E = T 
(xi) E g {7,1} iff T = r’ Bs), T” contains only decisions, there exists 
E' € N* where 7(E) = E = E’ is the minimal false clause in N*, and 
ABekE 
(xii) [EC for all C € N°UU with C x, D 
(xiii) Conflict is not applicable to (T; N°; U; 8; k; E), Dy). 


(xiv) L N} UU and E = L iff r =e and LEN’ 


The above invariants can be summarized as follows: (i) All ground atoms 
encountered are known from the start and the trail bound ĝ is large enough 
so SCL can Decide/Propagate them. (ii)—(iv) Every initial clause C or inferred 
clause by SUP-MO must coincide with an initial clause C’ or learned clause by 
SCL; this means on the one hand that for every clause C learned by SCL-SUP, 
SUP-MO infers a clause C” that is identical up to factoring; on the other hand 
it means that for every clause C inferred by SUP-MO, SCL-SUP learns a clause 
C’ that entails C (i.e. C” } C) and is at most as large as C wrt. y. (v)—(ix) The 
partial model constructed by SCL-SUP and SUP-MO coincide and any atom B 
in Nc U c produced by clause D has a clause D’ on the SCL side that could 
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propagate B and vice versa. (x)-(xiii) Ensure that any Conflict in SCL-SUP 
corresponds to a minimal false clause and that the trail is always constructed 
in such a way that the Resolve applications per Conflict call are limited to the 
maximal literal in the conflict; this property is needed or the next clause that 
would be learned by SCL no longer coincides with the clauses learned by SUP- 
MO. (xiv) Describes the final state in case the input clause set is unsatisfiable. 

Now that we have defined how an SCL state must look like in order to simu- 
late a superposition state, we define SCL-SUP, the SCL strategy that eventually 
simulates a SUP-MO run. First, note that not all states visited by SCL-SUP 
satisfy the invariants of Definition 7. However, the invariants hold again after 
each so-called atomic sequence of SCL-SUP steps. Second, one atomic sequence 
of SCL-SUP steps may skip over several successive superposition states. The 
reason is that SCL can and must skip all steps of SUP-MO that occur because 
the maximal literal in a clause is not strictly maximal, i.e., superposition Fac- 
toring steps. SCL performs factoring implicitly in its Propagation rule so SCL 
never has to explicitly simulate case (4) of Definition 5. Third, definition of the 
SCL-SUP strategy is split in two parts and each part describes some atomic 
sequences of SCL-SUP steps. 


Definition 8 (SCL Superposition Strategy: SCL-SUP Part 1). Let 
So = (T; N?; U; 8; k; T)a,c,7) be an SCL state with additional annotations for 
the strategy. Let D be the next largest clause from C in the ordering <, with 
respect to the ground clause set N° UU. Let L be the maximal literal of D. Let 
MA, 7Ag,...7Ap] be all negative literals such that for alli we have A; < L, all 
A; undefined in T, A; occurs in N} UU, and A; < Aj41. Let D' = y(D) be in 
N’ such that sfac(D) = sfac(D’). Let jo +1 be the number of occurrences of L in 
D' and j = i + jo. Then the SCL Superposition Strategy (SCL-SUP) performs 
the following steps to So (possibly without any actual SCL rule applications, just 
changing the state annotation): 


(1) First decide all literals [>A1,7Az2,...7An] in order, i.e., So >48 p Si, 
where Sı = (I, ART ..., Akt? NO. U; Bik +n; Tami 
(2a) If the maximal literal L in D is positive (i.e., L = B), 
r, A ..., 7Akt+" 4 D, and Conflict is not applicable to 
S2 = (T, =A], ..., n Akt», BOY NO: U; Bek +n +1; T) Gay then 
decide B, i.e., Sı pe p Sg, where >’ is the same as y except that 
y'(D) = sfac( D). 
(2b) If the maximal literal L in D is positive (i.e., L = B), 
T, AR ..., 7Ak+" LL D, and E is the smallest clause in N° UU that 
is false in wrt. I, ART ..., 7 Akt” Bsfac(D) | then propagate B and apply 
Conflict to E, i.e., Sy E ak Ss S onma S2, 
where Sy = (I, ART ..., n Akt”, BD), N0; U; B; k +n; E),p,") and 
y’ is the same as y except that y'(D) = sfac( D). 
c therwise, S2 = Sı and no further rules have to be applied. 
2c) Otherwise, S2 = S d h les h b lied 
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A (potentially empty) sequence of SCL rule applications according to SCL-SUP 
is called an atomic sequence of SCL-SUP steps if it starts from a state So and 
ends in a state So outlined in the cases (2a-c). 


The first part of the strategy simulates the recursive construction of the 
partial model used in the SUP-MO strategy (see Definition 3). It assumes that 
the model is already constructed up to the current annotated clause C and 
extends this model for the next largest clause D € (N° UU). To this end, it uses 
the rule Decide in step (1) to set all atoms A to false that are still undefined but 
can no longer be produced by any clause greater or equal to D. Next the strategy 
makes a case distinction. Step (2a) handles the case where D corresponds to a 
clause D’ in the superposition state (modulo some Factoring steps skipped by 
SCL) that produces atom B; SCL-SUP then adds B to the trail with the rule 
Decide because producing/adding this atom does not falsify a clause. Step (2b) 
handles a similar case compared to step (2a); but in this case producing/adding 
the atom B to the trail results in a minimal false clause E; in order to force a 
resolution step between clause D and E, SCL-SUP first uses Propagate to add 
B to the trail and then applies conflict to Æ. Step (2c) handles the case where 
D corresponds to a clause D’ that will not produce an atom B even modulo 
some Factoring steps; in this case no further SCL rule applications are necessary 
as the SUP-MO model will not change. Note that the annotated function y is 
needed so the SCL state knows when the superposition state would have applied 
Factoring to a clause C, which also means that it is now treated as its factorized 
version 7(C) = sfac(C) in our inductive clause ordering. 


Example 9. Let us now further demonstrate the three different cases of the first 
part of the SCL-SUP strategy with the help of an example. Let N° be our initial 
set of clauses: 


N° = {(C1) P(a), (C2) =P(b) V Q(a), (C3) ~P(a) V Q(a) V Qla), 

(C4) P(a) V =Q(a), (C5) =P(a) V =Q(a)} 
We compare the run of SCL-SUP for N° with the run of SUP-MO for N° to 
demonstrate that both runs coincide. As superposition ordering, we choose an 
LPO with precedence a < b < P < Q. This means that the atoms are ordered 
P(a) < P(b) < Q(a) < Q(b) and the clauses in N? are ordered C1 < C2 < 
C3 < Cy < Cs. The initial SUP-MO state is simply the clause set N? and the 
initial SCL-SUP state is (e€, N°, Ø, 8,0, T) (0,149), where Yo(C) = C for all clauses 
C. In the first step of SCL-SUP, SCL-SUP first selects the clause C4 as its new 
decision aid because it is the next largest clause in N° compared to L. Then SCL- 
SUP continues with step (1) of Definition 3. In this step SCL-SUP does nothing 
because there are no atoms smaller than P(a). Next, SCL-SUP detects that the 
maximal literal of C4 is positive, e / C1, and that the trail [P(a)+] does not result 
in a conflict. Therefore, SCL-SUP follows step (2a) of Definition 3 and Decides 
P(a), which results in the state ([P(a)*], N°, Ø, 8,1, T) (0,cy,49)- Meanwhile, SUP- 
MO starts with constructing a model for N° starting with the clause C1. The 
result is that C4 is productive and 6c, = {P(a)} and N@, = 0, which coincides 
with our new SCL trail. 
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SCL-SUP considers the clause Cù as its new decision aid and continues with 
step (1) of Definition 3. This time there is an atom smaller than the maximal 
literal of Cz namely P(b). Therefore, SCL-SUP Decides —P(b) in step (1) of 
Definition 3, which results in ([P(a)', ~P()?], N°, 0, 8,2, T)(0,C2,49). Next, SCL- 
SUP detects that the maximal literal of C2 is positive but that [P(a)', ~P(b)?] = 
C2. Therefore, SCL-SUP follows step (2c) of Definition 3 and ends this atomic 
sequence immediately. SUP-MO continues the model construction for N° with 
the clause Co. The clause Co is not productive because Ne, Ey Co, where 
Ng, = 60, = {P(a)} and ôc, = 0, which again coincides with our new SCL trail 
as Herbrand models do not explicitly define atoms assigned to false. 

SCL-SUP now considers the clause C3 as its new decision aid and continues 
with step (1) of Definition 3. In this step SCL-SUP does nothing because all 
atoms smaller than Q(a) are already assigned. Next, SCL-SUP detects that the 
maximal literal of C3 is positive, [P(a)',P(b)?] |K C3, and that the clause Cs is 
false with respect to the trail [P(a)!, ~P(b)?, Q(a)**°(°)]. Therefore, SCL-SUP 
follows step (2b) of Definition 3, i.e. it Propagates P(a) and applies Conflict to 
Cs, resulting in ([P(a)!, ~P(b)?, Q(a)*()], N°, 0, B, 2, C3) (1,02,41), Where 1 
is identical to yo except that y1(C3) = sfac(C3) = —=P(a) V Q(a). Note that 
SCL-SUP must change the state annotations because the maximal literal in C3 
is not strictly maximal, so SCL-SUP skips and eventually silently performs the 
Factorization step performed by SUP-MO. Note also that in the changed clause 
ordering <4, the order of Cz and C3 changed, i.e., C3 <y, C2, which corresponds 
to sfac(C3) < C2. Meanwhile, SUP-MO continues the model construction for N° 
with the clause C3. The clause C3 is not productive because the maximal literal is 
not strictly maximal so 4(3) = Ø and Ne, Udc, Ax C3 so C3 is the minimal false 
clause in N°. SUP-MO resolves this conflict by applying Factoring to C3, which 
means SUP-MO infers the clause Cg = sfac(C3) = ~P (a) V Q(a). The new clause 
order in superposition state N! = N°U{C¢} is Cy < Cg < C2 < C3 < Cy < Cs, 
which matches the changed ordering C3 <,, C2 because Cg = 71(C3). Next, 
SUP-MO updates its model construction for N1. The result is that Cı and Cg 
are productive and that N4, U dc, = {P(a), Q(a)}, which matches the current 
SCL trail. Moreover, if we continue the model construction upto Cs then no new 
literals are produced and Cs also turns into the minimal false clause for N+. 


Definition 10 (SCL Superposition Strategy: SCL-SUP Part 2). Let 
So = (T, BRO. N?; U; 8; k; E)ü,c) be an SCL state with E ¢ {T, L} and 
additional annotations for the strategy. Let L = =B be the maximal literal of 
E. Let I contain only decision literals. Let all atoms A occurring in N° U U 
with A < B be defined in T following the order <, i.e., for all A occurring in 
N? UU with A < B there exist I’ and I" such that I" = I", L4, I", LA =A 
or La = —A and all atoms A’ € N? UU with A’ < A are defined in I’. Let E be 
contained in N*. Let jo be the number of occurrences of L in E and j = i + jo. 
Let sfac(C) = C1 V B and E = E' V E”, where E” contains all occurrences of L 
in E. Then the SCL Superposition Strategy (SCL-SUP) performs the following 
steps to So: 
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(1) First apply Resolve to E until all occurrences of L are resolved away, i.e., 

So > 3Ee sub Si, where S2 = (T, BO; N°; U; B; k; Ea)G,c, and Ey = 
E'V CiV...V Cj. 

(2a) If Ey = L, then we apply Skip until the trail is empty and then stop the 
SCL run, i.e., S2 >EE yp S5, where S5 = (€; N?; U; 6; 0; L), a ays 

(2b) If E2 # L, then E> has a maximal literal Lı. Next the strategy applies Skip 

until comp(L1) is the topmost literal on the trail, i.e., S2 E ae S3, 

where S3 = (Ip, L™; N°; U; 6; kı; E2)(j,c,y). (Note that this step skips at 

least over the literal Bs®(C) ), 

Next apply Backtrack to S3, i.e., S3 = 2 acktrack S4, where S4 = (Ip; N}; UU 

{Eo}; 8; ki — 1; T) G,- 

(4a) If Lı is a negative literal, continue with the following rule applications. 
Let D be the smallest clause in N? UU with maximum literal comp(L1) = 
Bı and In K D. Then Propagate Bı from D, and apply Conflict to Eo, 
ie., S4 Se N >ii Ss, where Ss = (Io, BP), N°. U 
{E2}; 8; kı — 1; E2)5,0,7)- 

(4b) If Lı is a positive literal (i.e., Lı = B) and Conflict is not applicable to 
Ss = (To, BE; N°; U U {E2}; p; ki; T)Ga,Ea,y); then decide B, ie., 
S4 S E p Ss, where jı + 1 is the number of occurrences of Bı in 
E2, j2 = j + jı, andy is the same as y except that y' (E2) = sfac( E2). 

(4c) If Lı is a positive literal (i.e., Lı = B) and Ez is the smallest clause in 
N? UU that is false in S$ = (Ip, B32), yo, U; p; kı — 1; T)G, E), then 
propagate Bı and apply Conflict to E3, i.e., S4 Se St So 
Ss, where Ss = (Ip, Bec). N0, U; B; ky — 1; E3) (ja,Ea,y); Ja + 1 is the 
number of occurrences of By in Ey, jo = j + jı, andy’ is the same as y 
except that y' (E2) = sfac( E2). 


(3 


a 


A (potentially empty) sequence of SCL rule applications according to SCL-SUP 
is called an atomic sequence of SCL-SUP steps if it starts from a state So and 
ends in a state Ss outlined in the cases (2a) and (5a-c). 


The second part of the strategy simulates the actual inferences resulting from 
a minimal false clause found in step (2b) of Definition 8 or found in steps (4a) and 
(4c) of Definition 10. These inferences always correspond to Superposition Left 
steps of the SUP-MO strategy that resolve minimal false clauses E’ in N* with 
maximal literal ~B with the clause C” in N’ that produced B. Note however that 
SCL-SUP may combine several Superposition Left steps of the SUP-MO strategy 
into one new learned clause. This is the case whenever the maximal literal ~B 
in the minimal false clause E’ in Nt is not strictly maximal. In this case, the 
next minimal false clause E” will always correspond to the last inferred clause, 
the maximal literal of this clause will still be ~B, the clause producing B will be 
again C”, and therefore the next Superposition Left partner of E” is also again 
C". Moreover, all of the skipped inferences are actually redundant with respect to 
the final inference Æ% in this chain, which explains why SCL-SUP is still capable 
of simulating SUP-MO although it skips the intermediate inferences. The actual 
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SCL-SUP clause Æ> corresponding to final SUP-MO inference E4 is computed in 
the steps (1) and (2) of Definition 10 with greedy applications of the rules Resolve 
and Factorize. The following steps of Definition 10 take care of the four different 
cases how E% changes the model and minimal false clause in N’. The first case 
is that ES = L so SUP-MO has reached a final state. This case is handled by 
step (2a) of Definition 10 that simply empties the trail with applications of the 
rule Skip so the resulting SCL state has the form of a SCL-SUP final state. The 
second case is that the maximal literal L in E4 is negative. In this case, the 
model for N* and NJ is still the same and just the minimal false clause changes 
to Æ}. This case is handled by steps (2b)—(4a) of Definition 10 that Backtrack 
before comp( Lı) was decided, propagate it instead and apply Conflict to E2. In 
the third and fourth case the maximal literal Lı in E% is positive. In this case, 
the model for N’ and N5 actually changes because E is always productive. Case 
(2b)—-(4b) of Definition 10 handles the case where producing Lı leads to no new 
minimal false clause, and case (2b)—(4c) of Definition 10 handles the case where 
it does. Both cases work symmetrically to steps (2a) and (2b) of Definition 8. 


Example 11. We continue Example 9 to demonstrate cases (1)—(4a) and 
(1)—(2a) of the second part of the SCL-SUP strategy. We left the runs in 
the SCL state ([P(a)!, ~P(b)?, Q(a)()], N°, Ø, B,2,C3)a,02,) that simu- 
lates the superposition state N+, where 


N! = {(Ci)P(a), (C2) =P(b) V Qla), (C3) >P(a) V Q(a) V Q(a), 
(Ca) P(a) V =Q(a), (Cs) =P(a) V =Q(a), (C6) =P(a) V Q(a)} 


and Cs became the minimal false clause in N! after C, and Ce produced together 
the partial model {P (a), Q(a)}. SUP-MO continues from the state N! by apply- 
ing Superposition Left to Cs; and Ce. In the new state N? = N'U{(C7) =P(a)V 
—P(a)} the new clause order is Cy < C7 < Cg < Cy < C3 < Cy < Cs and 
after constructing the model for C1, which produces again P(a), the clause 
C7 becomes again the minimal false clause. SCL-SUP follows (1) of Defini- 
tion 10 and applies Resolve to Cs and sfac(C3) = Cg, resulting in the state 
([P(a)+, =P(b)?, Q(a)se(2)], N°, 0, B, 2, C7)(2,C2,71): Then SCL-SUP continues 
with steps (2b) and (3) by applying Skip twice and Backtrack once to jump 
to the state (e, N°, {C7}, 8,0, T)(2,c4,.,). Next, SCL-SUP continues with step 
(4a) because the maximal literal of C7 is —P(a) and therefore negative. This 
means SCL-SUP will add P(a) again to the trail but this time by applying 
Propagate to C and afterwards it applies Conflict to C7. The resulting state 
([P(a)s(C0], N°, {C7}, 8,0, Cz) (2,0,,7,) matches again the SUP-MO state N?. 

SUP-MO continues from the state N? by applying Superposition Left to C7 
and C4, resulting in N? = N? U {(Cg)—P(a)}. Since Cg has the same maximal 
literal as C7 it becomes automatically the next minimal false clause in N°. As a 
result, SUP-MO applies Superposition Left to Cg and C1, which returns N° = 
N? U {(Cg) L} a final state that proves the unsatisfiability of N°. Meanwhile, 
SCL-SUP simulates both Superposition Left steps with one atomic SCL-SUP 
sequence. It starts with step (1) of Definition 10 and applies Resolve twice, 
resulting in the state ([P(a)!, ~P(b)?, Q(a)*°(°2)], N°, Ø, 8,2, L)(4,¢,,4)- Then 
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it continues with step (2a) of Definition 10 and applies Skip until the trail is 
empty. The resulting state (GN? 0,82) a tay) is a final state and proves 
unsatisfiability of N°. 


Example 12. The next example demonstrates the atomic sequence (1)— (4b) of 
the second part of the SCL-SUP strategy. Let N° be our initial set of clauses: 


N? = {(Ci)P(a), (C2) ~P(b), (C3) =P(a) VQ(a), (C4) P(b) V =Q(a) 


As superposition ordering, we choose an LPO with precedence a <x bx P < Q. 
This means that the atoms are ordered P(a) < P(b) < Q(a) < Q(b) and the 
clauses in N° are ordered C1 < Cy < C3 < C4. In order to keep the example 
short, we skip the initial SCL-SUP steps and continue directly with the state 
S = ([P(a)1, ~P(b)?, Q(a)se(02)], N°, Ø, B, 2, C4)(0,C4,4), Where y(C) = C for all 
clauses C and 3 = Q(b). This state simulates the superposition state N° upto 
the model construction for C3, where NG, Udc, = dc, Udo, = {P(a), Q(a)} and 
C; is the minimal false clause. SUP-MO continues from the state N° by applying 
Superposition Left to C4 and C3. In the new state Nt = N°U{(Cs) =P(a)VP(b)} 
the new clause order is Cy < Cs < C2 < C3 < C4 and the partial model 
upto Cs is N, U ðc, = ðc, U ðc, = {P(a)} U {P(b)}, which turns C2 
into the next minimal false clause. SCL-SUP simulates the above steps by 
following the atomic sequence (1)—(4b) of Definition 10. The result is the 
state ([P(a)!, P(b)sfe(O>)], N°, {C5}, 2,1, C2)(1,c4,4) matching again our current 
superposition state and model. 

Without clause C2, SCL-SUP would apply the atomic sequence 
(1)(4a) of Definition 10 to S, resulting in the state ({P(a)!, P(b)?], N° \ 
{C2}, {C5}, 8,2, T)(1,cs,7). This matches the state N1 \ {C2} and its partial 
model upto C; that is still the same as for N! with the exception that it does 
not lead to a minimal false clause. 


In order to actually show that every SCL-SUP run simulates a SUP-MO run, 
we need to prove three properties. The first property is that each state visited by 
an SCL-SUP run must simulate a state visited by the corresponding SUP-MO 
run. Note that this property does not yet say anything about the order in which 
SCL-SUP simulates the SUP-MO states. This property can also be seen as a 
soundness argument for our strategy. 


Lemma 13 (Initial SCL State Simulates Initial Superposition State). 
The initial SCL state (e; N°; Ø; 6;0; T)(o,1,y) simulates the initial superposition 
state N° and the model construction upto N? Uð, 


Lemma 14 (SCL-SUP Preserves Simulation). Let the SCL state S = 
(T; N?; U; G: ke E)a,oy) simulate the superposition state Nt and the correspond- 
ing model construction upto N&, U dcr, where C’ = 7(C). Let the SCL state 
S! = (T'; N?; U'; B; k'; E'\g,p,y) be the result of one atomic sequence of SCL- 
SUP steps. Then there exists a clause D' € N? with y'(D) = D' and S’ simulates 
the superposition state NJ and the model construction upto Nİ, Udp:. 
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The second property is that each atomic sequence of SCL-SUP steps always 
makes progress in the simulation. This means that each atomic sequence of SCL- 
SUP steps either advances the superposition state N* simulated by the current 
SCL state S = (T; N?; U; p; k; E)Gü,D,y) 1.e., it increases the annotated i, or it 
still simulates the same superposition state N’ but advances the simulation of 
the model construction operator, i.e. it increases the annotated clause C and 
keeps 7 the same. Note that it can actually happen that an atomic sequence of 
SCL-SUP steps skips over several superposition states. This property can also 
be seen as a termination argument for our strategy because SUP-MO always 
terminates on ground clause sets. 


Lemma 15 (SCL-SUP Advances the Simulation). Let the SCL state 
S = (T; N?; U; 6; k; E)(,p,7) simulate the superposition state N? and the model 
construction upto Nj U dp. Let the SCL state S' = (I"; N°; U'; B; k'; E'G, D'y) 
be the next state reachable by one atomic sequence of SCL-SUP steps. Then 
either i < j ori=Jj and y = y and DX, D’. 


The last missing property shows that the SCL-SUP strategy can always 
advance the current SCL state whenever the simulated superposition state can 
be advanced by the SUP-MO strategy. This means SCL-SUP is never stuck when 
SUP-MO can still progress. These properties hold because the simulation invari- 
ants in Definition 7 either correspond to a correct final state or they satisfy the 
preconditions of Definition 8 or Definition 10. This property can also be seen as 
a partial correctness argument for our strategy. 


Lemma 16 (SCL-SUP Correctness of Final States). Let the SCL state 
S = (T; N?; U; 6; k; E)(i,p,7) simulate the superposition state Nt and the model 
construction upto ND) U (p). Let there be no more states reachable from S 
following an atomic sequence of SCL-SUP steps. Then S is a final state, i.e., 
either (i) E= L, D= 1, L € N’, and N° is unsatisfiable or (ii) IT = N°. 


We can also show that any SCL-SUP run is also a regular run. Although 
this is not strictly necessary for the simulation proof, it is beneficial because it 
means that SCL-SUP inherits many properties that hold for SCL restricted to 
a regular strategy. For instance, that all learned clauses are non-redundant and 
that SCL-SUP always terminates. 


Lemma 17 (SCL-SUP is a Regular SCL Strategy). SCL-SUP is a regular 
SCL strategy if it is executed on a state S = (T; N°; U; b; k; E)(i,c,y) that sim- 
ulates a superposition state N* and the corresponding model construction upto 
Nyo) Y 8y(C)- 


4 Conclusion 


We have shown that the SCL(FOL) calculus [9] can simulate model driven super- 
position [1] refutations deriving only non-redundant clauses. The superposition 
calculus cannot simulate SCL refutations due to its static a priori ordering. 
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In general, an SCL(FOL) learned clause is generated out of several resolution 
and factorization steps. From this perspective the SCL(FOL) calculus is more 
general and flexible than the superposition calculus. Furthermore, it only gener- 
ates non-redundant clauses whereas any superposition implementation generates 
redundant clauses due to the syntactic application of the superposition inference 
rules. 

Selection in superposition can also be simulated, but requires an additional 
branch in the SCL-SUP strategy, because selection of non-maximal, negative 
literals by superposition requires a different trail ordering for SCL in order to 
simulate a respective superposition left inference. 

For future work, we plan to lift our simulation result from the ground case to 
the non-ground case. This lifting will require the extension of the SCL calculus 
by an additional rule that learns clauses that are computed as intermediate 
steps during the conflict analysis. This rule was left out of previous versions of 
SCL because we would never use it in a CDCL inspired SCL-run and because 
it would have complicated the termination and non-redundancy proofs for SCL. 
Nevertheless, we are confident that the rule can be designed in such a way that 
all properties of the original calculus still hold. 

Considering the extension to the non-ground case, this result can be used in 
various directions. It can be used to develop an alternative implementation of 
the superposition calculus. Given a fixed ordering, the trail can be developed 
according to the ordering, generating only non-redundant superposition infer- 
ences. On the other hand, the concept of finite saturation can be kept this way 
preserving a strong mechanism for detecting satisfiability. Secondly, the result 
means that SCL can be used to naturally combine propagation driven reasoning 
with fixed ordering driven reasoning. This might overcome some of the issues of 
the current first-order portfolio approaches implemented in the state-of-the-art 
provers. 

Another calculus contained in first-order reasoning portfolios is InstGen [12, 
15]. It abstracts a first-order clause set to propositional logic via a grounding 
with a single constant. In case a CDCL sat solver proves the abstraction unsat- 
isfiable, the first-order clause set is unsatisfiable too. For otherwise, the model 
found on the propositional level triggers an instantiation inference of a first-order 
clause. The instance rules out the before found propositional model modulo the 
abstraction. 

The CDCL model building after grounding can be simulated via a respective 
SCL trail. This will then lead to a stuck state if SCL is restricted to the InstGen 
grounding. Now let C be the false first-order clause selected by InstGen for an 
instance. Then the SCL stuck state can be extended to a conflict state for C. 
Then SCL will not learn an instance of C, but a related clause that also rules out 
the previously found model on the propositional level. This way the relationship 
between InstGen and SCL can be investigated as well. 


Acknowledgements. We thank our reviewers for their careful reading and construc- 
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Abstract. We present a simple calculus for deriving statements about 
the local behaviour of partial, continuous functions over the reals, within 
a collection of such functions associated with the elements of a finite 
partial order. We show that the calculus is sound in general and com- 
plete for particular partial orders and statements. The motivation for 
this work is drawn from an attempt to foster digitalisation in secondary- 
eduction classrooms, in particular in experimental lessons in natural sci- 
ence classes. This provides a way to formally model experiments and to 
automatically derive the truth of hypotheses made about certain phe- 
nomena in such experiments. 


Keywords: formal modelling - proof system - continuous functions - 
completeness 


1 Introduction 


Formal reasoning using proof rules is a well-established mechanism for explaining 
and deriving the truth of statements, both in general-purpose first- and higher- 
order logics [2,16] as well as special-purpose logics in arithmetic [5], knowledge 
discovery [15], program verification [13] etc. Here we are concerned with the 
problem of proving statements about the local “behaviour” of certain real-valued 
functions. A proof calculus for such simple statements may be interesting purely 
for its logical (meta-) properties. There is, however, also a very concrete motiva- 
tion for this work: digitalisation of experiments in natural sciences in secondary- 
education classrooms. Studies show how digitalisation can benefit such teaching- 
learning environments [10,18], not least by channeling pupils’ interaction through 
a software tool to enforce better learning [11]. 

In classes of natural sciences like biology, physics and chemistry, pupils are 
often taught some background knowledge about particular subjects which they 
then need to put to the test experimentally. For this, they are given a research 
question which typically asks them to discover and formulate a particular phe- 
nomenon in form of a so-called hypothesis, and to validate its correctness exper- 
imentally. Take for instance as an “experiment” in a physics class the standard 
European alternating current at 230 V 50 Hz. The way that voltage fluctuates 
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over time — in other words: time influences voltage — and voltage induces (resp. 
influences) a current, forms the background theory, and a research question 
could for instance be: how does the current change over time? We aim to pro- 
vide digital technology that can answer such questions automatically in order to 
give valid feedback to a pupil about their success in this task. 

Formal models for processes from natural sciences have been proposed in the 
literature [19], like Petri nets [6,12] or hybrid automata [1,3]. They allow for 
precise modelling of experiments; the price to pay is that of undecidability of 
model checking already, let alone validity checking. Moreover, they rely on exact 
knowledge about the nature of influences in such experiments, and this can often 
only be described by differential equations. Hence, determining correctness of a 
hypothesis requires sophisticated algebraic or numerical methods. 

Here, we model experiments abstractly as influence schemes, that is sets C of 
statements about certain parts of an influence, allowing them to be built from 
observations for instance. Correctness of a hypothesis H then is the question 
of whether H logically follows from C. We provide the framework for modelling 
experiments and hypotheses about influences in the form of a simple language 
of statements, a formal semantics via collections of partial continuous functions, 
and a proof calculus for logical consequence in this language. We show that it is 
sound in general, complete for a large and useful class of hypotheses and exper- 
iment models, i.e. influence schemes, and that it is polynomial-time decidable. 

The completeness proof uses elements that are similar to constructions for 
general logics. A key ingredient is normalisation, essentially a saturation process 
comparable to the construction of Hintikka or maximally consistent sets, cf. |7, 
17]. Another one is the effective construction of countermodels for such saturated 
sets, cf. [8,9,14]. The details of these constructions are of course tailored to the 
specifics of the mixed discrete-continuous structures here, dealing with properties 
of collections of (partial) continuous functions associated with pairs of elements 
of some underlying finite partial order. 

The paper is organised as follows. Section 2 introduces the mathematical basics 
in terms of functions on the reals, statements, influence schemes, hypotheses etc. 
Section 3 presents the proof calculus including its soundness. Section 4 begins by 
showing that the proof calculus is generally incomplete, as the relatively sim- 
ple statements cannot make assertions capturing certain phenomena arising with 
functions on the reals. We then develop a restriction on influence schemes and 
show that completeness does hold in this case. The full proofs of technical lemmas 
are omitted for reasons of space restriction. Section 5 discusses the computational 
problem of proof search. Section 6 concludes with remarks on further work. 


2 Modelling Influence 


Statements and Influence Schemes. In all of the following, V = {a,b,...} 
denotes a finite set of variables, and we assume that these are partially ordered 
by < with < denoting its strict subset. 

An interval (of reals) is denoted [x,y] for x,y E€ QU {—o0, oo} with x < y. 
Abusing standard notation, we write, e.g. [—oo, 10] rather than (—o0o, 10] for the 
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set of all real numbers z with z < 10, since we only consider intervals that are 
closed at rational bounds (for purposes of effective representation) and semi- 
open only at infinities. This provides a common notation for intervals and saves 
us making case distinctions everywhere, depending on the interval bounds. 

A y-statement is a 5-tuple S = (a, I,q, T',b), typically written as a 22! b, 
s.t. a,b € V with a < b, and J, I’ are intervals in the sense above. I is called the 
domain, denoted dom(S'), and required to be a non-singleton interval. I’ is the 
range, denoted rng(S). Finally, q E€ Q := {/7,\,,~»} is called a behaviour. It 
describes a gradient of the influence abstractly as either monotonic, antitonic, 
constant or arbitrary. When the variables a,b involved in the statement S are 
clear from or irrelevant for the context, we also often simply write 2a! . 

The statement S is used to formalise the assertion “variable ainfluences vari- 
able b on the interval I in a way described by q, s.t. varying the value for a in 
this interval results in b taking values from the interval I'.” 

A V-influence scheme, or simply influence scheme if V is clear from the con- 
text, is a finite set C of V-statements. Intuitively, an influence scheme describes 
the way that certain variables influence each other in an abstract way. 


Example 1. We build an influence scheme for the AV-voltage experiment. The 
relevant variables are t for time, v for voltage and c for current, ordered by t < 
v <c. A theory of how voltages alternates over time (in the standard European 
alternating 230 V/50 Hz setting) and how it induces a current at a resistance 
of 326 N2 can be formalised as follows. Remember that a scheme is a finite 
set of statements like t 10:5] Z [0,326], y etc. Each can easily be visualised as a 
rectangle in the 2-dimensional plane for the pair of involved variables: horizontal 
and vertical edges determine domain and range, and the behaviour can be shown 
as a label on the rectangle. A particular influence scheme C with 20 statements 
is shown in Fig. 1 as grey rectangles in this way. The behaviours in the graph in 
the middle are left out for better visibility; they are all supposed to be Z. 

The orange lines in the graphs of Fig. 1 represent a so-called influence exper- 
iment, as it will be explained below. At this point, it can be used to show that 
influence schemes as formal models of experiments can be obtained through data 
sampling. Note how the borders of the rectangles in the scheme C coincide with 
values of the functions represented by the orange lines in most cases. 


Note that the scheme C shown in Fig. 1 contains no statements for the pair 
(t, c) of variables. This does not mean that time does not influence current in this 
scheme: clearly, if time influences voltage, and voltage influences current, then 
time executes some influence on current. Hence, a valid question asks whether 
the statement H shown as a blue rectangle follows logically from the scheme C in 
the sense that whenever time influences voltage and voltage influences current 
in the way described by C, does time then also influence current in the way 
described by H? We use the letter H for such a statement as it plays the role 
of a hypothesis: in logical terms it is just a statement, but from an application 
point of view it is special in that it signifies an implicit question after its truth 
with respect to a scheme. 
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Fig. 1. An influence scheme (grey rectangles), a hypothesis (blue dashed rectangle) 
and an influence experiment (orange lines) between time, voltage and current. (Color 
figure online) 


A Formal Semantics. In order to give a well-defined meaning to the question 
whether H follows from C for a given scheme C and hypothesis H, we introduce a 
formal interpretation of statements in so-called influence experiments. We need 
to recall and define a few technicalities about functions over the reals. 

An influence is a function f : R—Rs.t. dom(f) is a non-singleton interval 
in the sense above, and f is continuous on its domain in the usual sense. We 
write f(x) = L if x g dom(f). When composing partial functions we assume 
undefined values to be absorbing, i.e. g(f(x)) = L if f(x) = L. 

An influence f is called monotonic, antitonic or constant on [x,y] C dom(f), 
if for all z, 2’ € [x,y] with z < z’ we have f(z) < f(z’), respectively f(z) > f(z’) 
and f(z) = f(z’). It satisfies the statement S = Jewlalze'v'l, , written f H S, 
if the following two conditions are met. 


1. f(z) € [2’, y’'] for all z € |z, y]. 
2. q= ŽŽ and f is monotonic on |z, y], or q = N and f is antitonic on |z, y], or 
q = -— and f is constant on |z, y], or q = ~>. 


Since every constant function is monotonic and antitonic, and each of these 
is also an arbitrary one, we naturally obtain a partial order < on behaviours 
that features unique infima and suprema, shown in Fig. 3. Note that, whenever 
fH 424, and q3 q' then also f H Æ. 

We are now ready to define the formal semantics of influence schemes. 


Definition 1. Let V be as above. A V-influence experiment is a collection F of 
influences, namely one function Fa» for each pair (a,b) s.t. a < b, altogether 
satisfying the following coherence property (CP). 


~ For alla,b,c E V s.t. a < b, b< c and all x € R: Fa e(£) = Fo e(Fa b(£)). 


F satisfies the V-statement S = a 14", b, written F = S, if Fap = 1. 
F satisfies the V-influence scheme C, written F = C, if F | S for all S €C. 


CP together with the absorption of L in function composition is the reason 
for demanding the variables to be partially ordered: Fa, a, for any variable a 
would have to be the total identity function to satisfy CP. And then we would 
have Fha = Fl for any a,b. Thus, by demanding that Fa» is only defined 
whenever a < b we avoid problems arising with non-invertible functions. 
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Example 2. Figure 1 shows a particular time-voltage-current experiment F as 
three influences drawn as orange graphs. It represents the way that voltage 
alternates in time along a sine curve with amplitude 230. V2 ~ 326 V and 
frequency 50 Hz. Pieci current depends linearly on voltage in this experiment, 
with a factor of 3 z5 used here suggesting an electrical resistance of 326 N2. The 
coherence property then demands a third influence F;,, as their composition on 
the domain of Fw = [0,00] which is also a sine curve. 

Let C be the influence scheme shown in Fig. 1 and introduced in Example 1. 
Clearly F 4 C because F does not satisfy the second (degenerate) rectangle 
representing the statement t [3:7] > [264,264], v and neither the fifth representing 
t [12.16] \. [310-192], v, This is because Fy is neither constant on [3,7] nor 
antitonic on [12,16], and because it assumes values outside of the statements’ 
ranges on these domains, e.g. Fy.y(5) = 326 ¢ [264, 264] and F,(15) = —326 ¢ 
[—310, —192). 


Note that satisfaction of a statement S by an influence f means that the 
graph of f enters the rectangle representing S through its left edge and leaves it 
only through its right edge, and within this rectangle it displays the behaviour 
stated in S. This is the case for instance for the hypothesis H drawn as a blue 
rectangle: F | H indeed. But this does not allow any conclusion to be drawn 
about whether H follows from C in any way. 

The interpretation of an influence scheme through influence experiments nat- 
urally gives rise to a notion of logical consequence: we say that the V-statement 
H follows from the V-influence scheme C, written C | H, if F H H for all 
y-influence experiments s.t. F = C. Thus, an influence scheme C can be seen 
as a finite representation of an (uncountable) number of V-experiments, which 
yields the abstract nature of these schemes as mentioned in the introduction. 

The semantics also gives rise to a natural notion of equivalence between 
schemes: C and C’ are equivalent, written C = C’, if for all F we have F EC 
iff F H C’. Note that this is the case iff for all hypotheses H we have C = H 
iff C’ H H. Equivalent schemes can therefore be seen as (possibly different) 
descriptions of the same experimental setup, up to a certain amount of impre- 
cision determined by the description of the experimental setup through discrete 
statements. 


3 The Calculus of Influence 


The concept of consequence between a scheme and a hypothesis provides the 
foundations for a logical approach to modelling experimental setups and cor- 
rectness of hypotheses w.r.t. them. Ideally, the consequence relation = would be 
decidable, since this would provide a way to automatically check the correctness 
of a hypothesis w.r.t. a given scheme. In this section we develop a proof-theoretic 
characterisation of } in terms of a provability predicate +. Ideally, would be 
sound and complete w.r.t. =, i.e. a statement would follow from an influence 
scheme iff it is provably derivable from it. Then decidability of + (cf. Sect. 5) 
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Fig. 2. Proof rules for correctness of a statement w.r.t. an influence scheme C. See 
Fig. 3 for the definitions of < and &. 


would yield the basis for automatic reasoning about influence in experimental 
setups. 

Henceforth, let V and a V-influence scheme C be fixed. We say that a V- 
statement H is provable w.r.t. C, written C F H, if there is a finite proof for H 
in the proof system whose rules are shown in Fig. 2. 

We will briefly explain the intuition behind each of them. The rule (F), which 
serves as an axiom, essentially states that any statement which is part of the 
scheme, follows from it. (G) expresses the fact that experiments are comprised 
of potentially partial functions whose domain is always some interval. It states 
that any function Fa, which shows some certain behaviour on the interval [x,y], 
and some certain behaviour on the interval |x’, y’] where y < x’, must also be 
defined on the interval [y’, x]. However, we cannot determine better bounds than 
infinities on its values, nor a non-arbitrary behaviour there. 

Rule (T) expresses the transitivity principle laid out in the coherence property 
of Y-experiments: when a influences b s.t. a-values in I lead to b-values in Jy, 
and I, C Iz, and b-values in Ig lead to c-values in J’, then a-values in I lead 
to c-values in I’. Moreover, the behaviour of the influence from a to c can be 
derived from the ones from a to b and from b to c via the multiplication table 
for ® shown in Fig. 3. 

Rule (I7) expresses weakening of statements w.r.t. the involved intervals. 
Any function which maps values from Jı to values in Jz must also do so for 
values from a subset of I}, and their range is naturally limited by any superset 
of Iz. On the other hand, (It) represents an important strengthening principle: 
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Fig. 3. Order < (left) and multiplication ® (right) on behaviours. 


any function that maps values from J; to I; and values from J to I4 must map 
values from Jı NJ; to I2NI%. Note that the rule is only (meaningfully) applicable 
if I; NI, # Ø. Moreover, the behaviour on the intersection can be determined 
from those on the two involved intervals. For instance, if Fa p» is monotonic on 
I, and antitonic on Ji then it must be both monotonic and antitonic on J N Ii, 
hence, it must in fact be constant there. 

Rules (Lt)-(R*) express further strengthening principles which are appli- 
cable in situations where two statements are made about the behaviour of a 
function on adjacent intervals. Suppose for instance, that Fa,» maps values from 
[x,y] monotonically into [l, u], and values from [y, z] somehow into [l’, u’]. In 
particular, we have Fa p(y) < u since y €E [x,y], and Fa ply) < u’ since y € [y, 2], 
ie. Faly) < min(u, u’). By monotonicity, for all z’ with x < z’ < y we must 
have Fa p(z’) < min(u, u’) as well. Hence, from the knowledge about the mono- 
tonic behaviour of Fa» on [x,y] and the upper bound on an adjacent interval 
to the right of it, we can possibly infer a tighter upper bound on the values of 
Fa» on [x,y]. This is what rule (Lt) does. The other three rules (Lt), (Rt) and 
(R*) cover the analogous cases of the behaviour being antitonic or the adjacent 
statement being on the other side. 

Rule (J) can be used to infer statements about the behaviour of a function 
on parts of its domain which are comprised of several intervals. If Fa p» maps 
values from [x,y] into I; with behaviour q, and values from [y, z] into Iz with 
behaviour q’, then it maps values from |x, z] into Jı U J2, provided that this is an 
interval. Moreover, the behaviour on the larger interval can be determined from 
q and q’ by simply taking the supremum w.r.t <. This is obviously associative, 
which allows us to write sup- (q1, . --, qn) without ambiguity. 


Note that (J) is also a weakening rule: for instance, from Sı = a [%1 ~ 10.1, b 


and Sy = a {L2 ~ [1:2], 6 we can infer S = a 10:2] ~ 122], b, describing any 
influence Fa p that maps values from [0,2] to [0,2], for instance Fy (£) = 2 — x. 
Le. we have F | S, but F  S; and F | S2. Likewise, (Q7) allows the 
weakening of behaviours. It states that a function which possesses a certain 
behaviour on an interval also possesses any weaker behaviour on this interval. 
At last, rule (C) expresses a simple principle: an influence of variable a onto 


b whose values can be bounded by a singleton interval, is of constant behaviour. 


Example 3. A proof of C + H for the scheme C and the hypothesis H = 
t [12-5,15] \[-1.05,-0.5], c shown in Fig. 1 (cf. Example 1) is given in Fig. 4. The 
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Fig. 4. Proof of the hypothesis H from the scheme C in Example 1. 


subtrees that are abbreviated by vertical dots are very similar to their siblings 


and therefore omitted in order to keep the tree small. 
The following theorem then guarantees that C = H holds, too. 


Theorem 1 (Soundness). LetC be an influence scheme and S be a statement. 


IfCt S thenC ES. 


Proof. First we observe that all the rules are sound in the sense that if C 


for all premises T of some rule, then C j S for its conclusion S. This is trivial 
for rule (F) and can be easily be shown by contradiction for the other 11 rules. 
The theorem can then easily be shown by induction on the height of a proof tree 


for CF S. 


= T 


4 Completeness for Elementary Diamond-Free Schemes 


General Incompleteness. We remark that the calculus of influence is not 
complete in general. Consider the variable order a < b < c and the scheme C (in 
grey) and hypothesis H (in dashed blue) represented by the following rectangles. 


b c C 
s Á 2 2 
fal La 1 
omy, o 0 
0123 012 3 


It seems that H does not follow from C because it demands constant 
behaviour of an influence Fp e on the interval [1,2] while C only prescribes 
monotonic behaviour there. However, we have C = H indeed for the follow- 
ing reason: the combination of Sı = a 2] Z121, p with b 12] “121, e yields 
a 2] 71121, e. Together with a 2] \[L2], c we get a 3 >12], c, i.e. 
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we must have that Fa, is constant on [1,2] for any F with F | C. Since 
Fa c = Foc 0 Fa b and Fa p cannot be constant on [1,2] because of the two state- 
ments neighbouring 5), we must indeed have that Fp .. is constant on [1,2]. Thus, 
C H H but the rules do not support this kind of backwards reasoning (from (a, c) 
to (b,c)). Hence, we have C 7 H. 

There are two principal ways to go from here: either extend the calculus by 
rules formalising this kind of reasoning, or try to achieve completeness for a 
restricted class of schemes and hypotheses only. We do the latter; the former 
would require a significant extension of the machinery as the example above 
shows: backwards reasoning introduces nondeterminism, and in order to resolve 
it one needs to take contexts of statements into account. This suggests that 
general completeness may only be achieved through a general extension of the 
format of rules. Note also that completeness cannot hold for a class of schemes 
containing inconsistent ones, where C is said to be consistent if there is some 
F s.t. F | C. The reason is that we have C = H for any H whenever C is 
inconsistent, even when H makes an assertion about variables not occurring in 
C in which case it is clear that H cannot be derived from C. 


Normalisation. We develop some general machinery that is useful for obtaining 
completeness in a restricted case. For a scheme C and variables a,b with a < b 
we write Ca,» for the set of statements S € C s.t. S =a 2", b for some I, q, I’. 


Definition 2. We call a scheme C separated if for all a,b € V with a < b there 


aren € N and a1 < ... < n41 E€ QU {-c, ow}, behaviours q1,...,qn and 
intervals [l1, ua],..-,[ln, Un] s.t. 
Cap ={: [v1,2] qı [da tea], i [x2,£3] q2 [2,42] , , ox [@n,tn41] In [ln Un] }. 


This induces a natural notion of left and right neighbour of a statement T in a 
separated scheme, denoted Inb(T) and rnb(T) when they exist. 
We say that such a separated C is minimal if for alli = 1,...,n we have 


a) if qi = Z then Ui < Ui4+1 and li—ı SS li, 
b) if qi = \ then li = liga and Uj-1 > uii 
c) if qi = > then Ui < min(u;—-1, UWi41) and li = max(li—1, li+1), 


where we set lo = In41 := —œ and ug = Un+1 := œ to avoid case distinctions. 
C is called transitive if for all a,b,c E€ V with a < b < c and all x,y € R 
we have the following: if x € I, y € Ig for some statement a 424, b € C, 
and y € Is for some statement b 224, c € C, then there is a statement 
a 45%l6,e€C st. zE; and I¢ C I4. 
C is called normalised if it is separated, minimal and transitive. 


So, intuitively, separation and minimality predict that the statements in a 
normalised scheme can be arranged as a sequence of horizontally adjacent rect- 
angles, for each pair of variables a,b, with no gaps in between, and no statement 
can be strengthened further because of its left or right neighbours (compare this 
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Fig. 5. A normalisation C* (red) of the influence scheme C from Example 1 (Color 


figure online) (grey). 


to the strengthening rules (Lt)-(R{)). Transitivity means that C is complete in 
the sense that whenever it allows Fa p(x) = y and Fe ely) = z for some 2, y, z, 
then it must also predict the possibility of Fa e(£) = z. 


Lemma 1 (Normalisation Lemma). LetC be a consistent scheme. There is 
a normalised scheme C* s.t. C* =C and for all T € C* we have CFT. 


Proof. (Sketch) We successively transform C into C* using operations that follow 
rule applications. (G), (I*) and (I~) (in restricted form) can be used to obtain 
separation, (Lt)—(R*) to ensure minimality, and (T) together with (J) to ensure 
transitivity. The trick is then to arrange the process of saturating C by adding 
new statements and replacing some with others in a terminating way. 


In the following, we will write C* to denote a normalised scheme obtained from C 
that satisfies the conditions of this lemma. Note that C* is not necessarily unique; 
for example statements with adjacent domains and equal ranges and behaviours 
can be merged using rule (J) or statements can be split w.r.t. to their domain 
using (I~) without breaking the conditions of the lemma. 


Example 4. Figure 5 shows the result of normalising the scheme C from Exam- 
ple 1 (grey rectangles) as a scheme C* with 11+25+11=47 statements shown 
as red rectangles. It should be clear that the hypothesis H, also depicted here 
as a blue rectangle, does indeed follow from C*: intuitively, it is impossible to 
draw an influence experiment into these diagrams as three functions that tra- 
verse through the red rectangles in the prescribed ways without also traversing 
through the blue rectangle correctly. 

Figure 5 suggests the use of the normalisation process for proof construction: 
a close inspection of the example proof in Fig. 4 allows the origin of the red 
rectangles touched by the hypothesis H to be traced back to the grey ones from 
the original scheme. 


Countermodel Construction. The following two lemmas contain one of the 
main ingredients for obtaining a completeness result: they show how to construct 
influences on a particular statement in a normalised scheme piecewise to one that 
satisfies all the statements for the same variables in this scheme. Note that this 
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does not construct an influence experiment (yet) as it does not show how to 
construct influences for other pairs of variables. 

We first make an observation about the possibility to satisfy statements in a 
normalised scheme by particular influences. A sequence S1,..., Sm of statements 
S; =a leisy] as [24,98] , b is called connected if yi = £i+1, ie. Si44 = rnb(S;) for 
alli <n. A connector for S,..., Sn is an influence f s.t. dom(f) = [1, yn] and, 
for all i < n, we have that f = Leys] ae [ei ryil, . Such a connector f is strict 
if, additionally, for all i < n we have f A ould’ levi, for any q' < qi. It 
is range-covering if there are x,y € [x1, Yn] such that f(x) = min{z},...,a/,} 
and f(y) = max{yj,...,y),}. Sometimes, we will need to construct connectors 
for single statements S' which are simply sequences of length 1 only. 


Lemma 2 (Connectors Lemma). Let C be consistent and normalised and 
S = a lay), be M. 


a) Suppose x”, y” € R are given s.t. x < x” < x' and y < y” < y'. Then there 
is a connector f for S s.t. f(x") = y". 

b) Suppose y” € rng(Inb(S)) N rng(S) is given. Then there is a connector f for 
S s.t. f(x) = y". 

c) Suppose y” € rng(S) N rng(rnb(S)) is given. Then there is a connector f for 


S s.t. f(x) = y". 
d) Let Sı,..., Sn be connected s.t. the behaviour of Si is not > for some i. Then 
there is a strict, range-covering connector for S1,..., Sn- 


Proof. (Sketch) Parts (a)-(c) essentially boil down to a case distinction, depend- 
ing on the behaviour q. However, it is relatively easy to observe that the require- 
ments in all three cases are always satisfiable by a function that is either linear 
or composed of two linear functions on the interval [x,x’], making use of the 
intuitive fact that in a rectangle, with two points given on the left and right 
edge and one in the middle, it is always possible to draw a (straight) line within 
this rectangle from the left point to the middle one, and then continue it to the 
right one. Part (d) requires a decomposition of the sequence $},..., Sn according 
to their behaviours. 


An immediate consequence of this is the possibility to build influences for not 
just a single statement in a normalised scheme, but in fact for all the statements 
concerning the same pair of variables. This crucially relies on parts (b) and (c) 
of Lemma 2. 


Lemma 3 (Small Extension Lemma). Let V be a partially ordered set of 
variables, a,b E€ V s.t. a < b, and C be a consistent and normalised V-influence 
scheme s.t. 


Cap = {$ € [1,02] q1 dis ak [z2,%3] q2 hy, rs [tn En+1] In In, }. 
ee c ——— M 
Tı T2 Tr 


Let 1 <j < k< n and f' be a connector for Tj,..., Tk. Then there is an 
influence f s.t. dom(f) = [t1,%n4i], f H| T; for all j =1,...,n, and f(x) = 
f'(x) for all x € [x£;, £k+1]. 
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Completeness for Elementary Schemes over Diamond-Free Orders. 
Let € be a class of pairs of schemes and statements. We say that the calculus of 
influence is complete for € if for all (C, S) € € we have: if C H S then C F S. We 
now concentrate on a class that allows for a construction proving completeness, 
and which still captures a large class of experiments and hypotheses occurring 
in natural sciences, cf. the concluding section for a discussion on that. 

We call a pair (a,b) of variables elementary if a < b and there is no c s.t. 
a< c< b. Any finite partial order is the (reflexive-)transitive closure of a finite 
set of elementary pairs. A statement a 47, b is called elementary if (a,b) is 
elementary. A scheme C is called elementary if all T € C are elementary. 

We say that the partial order < is diamond-free if for all a,b,c,d: ifa < b < d 
and a < c < d then b < c or c < b. In a finite diamond-free partial order, for 
every pair (a,b) with a < b there is a unique sequence c1,...,Cn for some n > 0 
s.t. (a, c1), (Cn, 6) and (ci, Ci+1) for i = 1,...,n — 1 are all elementary. 

In a diamond-free elementary scheme, all derivable non-elementary state- 
ments can be traced back to applications of the transitivity rule (T). Moreover, in 
any normalisation of a diamond-free elementary scheme obtained as in Lemma 1, 
all non-elementary statements can be traced back to an application of rule (T). 


Lemma 4 (Decomposition Lemma). LetC be an elementary scheme over a 
diamond-free partial order and C* be a normalisation ofC obtained via Lemma 1. 
Suppose T =a-44!',¢ € C* such that (a,c) is non-elementary. Then there is b 
witha <b < c and S = a 4241, b and S',...,S!, such that S, S$, ..., S4 € C*, 
joining S!,..., Sl, via (J) yields S! =b Lat, c, andq=q 89 Q, Ch. 


The key ingredients are that all non-elementary statements in C* are derivable 
in C, and the fact that C* is normalised, whence a derivation of T in C can be 
used to generate a derivation of T in C*. Note that w.l.o.g. we can assume that 
I, = Ig in the above lemma. 

Now let C be an elementary diamond-free scheme. We observe that any influ- 
ence experiment that satisfies all statements in C on elementary relations auto- 
matically satisfies all derivable statements on non-elementary relations due to 
correctness of the rules in the calculus of influence, in particular their observance 
of the coherence principle. This yields the following. 


Lemma 5 (Sufficiency Lemma). Let C be an elementary and diamond-free 
scheme, and let C* be a normalisation of C obtained via Lemma 1. Then any 
influence experiment that satisfies all elementary statements in C* satisfies all 
statements of C*. 


The next lemma then contains the heart of the completeness proof. It shows 
how to construct counterexamples, in the form of specific influence experiments, 
for normalised schemes and hypotheses that appear to state something different 
to what is contained in the normalised scheme. 


Lemma 6 (Counterexample Lemma). Let C be a consistent, elementary 
scheme over a diamond-free partial order and C* be a normalisation of C obtained 
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via Lemma 1. Let a,b€ V s.t. a <b and 


ead [x122] g1 [la ua], X [w2,e3] q2 [l2,u2], |., [£n:2n41] dn [inun] s}. 
i e M = 
Ty T2 Ta 


Let H = a [#o-volalhul, b. If one of the following conditions holds, then there is 
an influence experiment F s.t. F | C* but F - H. 


a) £o < z1 OF Yo > Ln41- 


b) 1) Ut; Un, un] Z [l u] or (11) supx(qi,---,qj) £ q holds, where i and j are 
the (necessarily unique) indices s.t. £o € (£i, £i+1] and yo € [£j, £j+1]. 


Proof. (Sketch) We give a high-level, intuitive idea of the construction. If (a,b) 
is elementary, it suffices to find an Fa, such that Fa, H= Că „ but Fa» A J. The 
functions for the other elementary relations can be interpreted in an arbitrary 
fashion such that F. q satisfies C* , for all (c,d). This is always possible since 
C, and hence C* is consistent. The interpretations of the non-elementary rela- 
tions are then obtained automatically via the coherence principle; note that this 
always satisfies any statements on the respective non-elementary relations due 
to Lemma 5. 

Case (a) is the simpler one. Here, [£o, yo] © [%1,%n+1]. Hence, it suffices to 
construct an experiment F s.t. dom(Fa,») = [%1,%n+41], whence F jA H. We need 
to ensure F | C by simply truncating the domain of any influence experiment 
that satisfies C. Such an experiment exists since C is consistent. 

For case (b), H disagrees with the statements in Cj, in at least one of two 
ways: (I) it restricts the values of an experiment at some point x more than the 
unique statement T; in the sequence in C% covering x does. Then we pick a 
value y that is covered by the vertical interval i in T; but not in H, use Lemma 2 
(a) to obtain a connector that runs through this point (x,y) and extend it to an 
influence using Lemma 3 to ensure F H C but F F H. Or (11) the behaviour 
stated in H is strictly stronger than those in the corresponding statements in 
C% ,. Then we obtain a strict connector for these statements using Lemma 2 (d) 
and extend it accordingly using Lemma 3. Strictness ensures that the influence 
Fa,» has the behaviours required by C* but not by H, hence F f H as well. 

If (a,b) is not elementary, by the decomposition lemma (Lemma 4) there is a 
sequence a = c1, ...,Cn = b of elementary relations and a sequence S},..., Sn—1 
of statements derivable in C* that satisfy the requirements of Lemma 4. We omit 
case (a). If we are in case (b) (1), again we pick a point (x,y) not covered by H 
but by the statements in C* ,. We then generate a sequence of points (x;, yi) for 
i < n such that © = x; and Yi = 2441 for alli < n and yn = y. It then suffices to 
invoke Lemma 2 (a) and Lemma 3 to complete the individual relations Fe, 
such that they go through the point (aj, yi). 

For the case (b) (11), it suffices to build interpretations of the Fe, ¢,,, that 
are strict w.r.t. S;. However, for i > 1, the statement T; might not exist in C*, 
but may only be derivable via (J). We use Lemma 2 (d) to obtain a strict, range- 
covering connector for the sequence of statements that derive S; and, again, use 
Lemma 3 to complete it into an influence for Fe Since these connectors are 


sCi41 


4 Ci+1° 
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range-covering, we obtain a strict interpretation for Fa, from these intermediate 
Fe; .ci41, Which is the desired contradiction. 


Theorem 2 (Completeness for Elementary Diamond-Free Schemes). 
The calculus of influence is complete for the class of consistent and elementary 
schemes over diamond-free partial orders, and arbitrary hypotheses. 


Proof. Let C be consistent and elementary, its underlying partial order < be 
diamond-free. Let C* be a normalisation of C obtained via Lemma 1. Hence, C* 
is also consistent. Let H = a [l44, b s.t. a < b and suppose that 


oo { [v1,v2)qi Iı [w2,"3] q2 I2 [tn ,0n41] dn In } 
a,b i ne ee a >S 
— Se YS SSS 
Ty Tə Li 
Moreover, by Lemma 1 we have C T; for all i = 1,...,n. 


If x < zı or y > &p41 then Lemma 6 (a) would yield a contradiction to 
the assumption that C* = H. Thus, there are i and j s.t. x € [x;, 241] and y € 
[v;,v;41]. Now we must have (Ji _, In C I and sup.(qi,---,4;) < q for otherwise 
Lemma 6 (b) would yield a contradiction to the assumption that C* } H. 

Let T := a Petit] sups(Gir-95) UUI, b By repeated applications of rule 
(J), T is provable from T;,..., Tj, whence C + T. Moreover, H is provable from 
T by at most one application of rule (I~) and (Q7) each. So C+ H as well. 


The completeness proof shows that for any consistent scheme there is always 
a satisfying experiment that is comprised of stepwise linear functions. One may 
argue that this does not capture the heart of functional behaviour in natural 
sciences. It is possible, though, to require influences not only to be continuous 
but even differentiable (on their domains). To fulfil this requirement, one could 
simply use splines of order 3 in the proof of Lemma 2 with their first derivative 
being 0 at the left and right edges of each rectangle. 


5 Proof Search and Empirical Results 


We observe that the consequence relation F between influence schemes and 
hypotheses is in fact polynomial-time decidable, using a bottom-up approach. 


Theorem 3. The problem of deciding, given a scheme C and a hypothesis H, 
whether or not C+ H holds, is decidable in time |C\C. 


Proof. A close inspection of the proof rules shows that rule (I~) can always be 
pushed downwards in a proof and successive applications of it can be shortened 
to a single one, s.t. C | H iff there is some H’ which is provable from C without 
using rule (I~), but H can be derived from H’ by a single application of (17). 
Next we observe that all rules except (I7) have the following property: the 
bounds of domain and range of the conclusion are bounds of the domain or range 
of some premise. This guarantees termination of a simple bottom-up procedure 
for proof search: saturate C by applications of all rules other than (I~). The 
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number of different statements created this way is bounded by 4-v?-b+ = O(|C|°) 
where v is the number of variables occurring in C, and b is the number of different 
interval bounds occurring in it. For each of these statements, check whether H 
can be derived using (I~). This can be done in time polynomial in |C]. 


An implementation of a proof search tool, written in Python, is publicly 
available.’ The repository also contains formalisations of some influence schemes 
and examples of statements whose derivability can be checked using the tool. A 
deeper look at the implementation details is beyond the scope of this paper and 
deferred for space considerations. It uses a more sophisticated top-down proof 
search that constructs only the relevant part of the normalisation of a scheme, i.e. 
only “around” those statements that can occur in a proof for the given hypothesis 
H. This can not only contain statements about other variables due to rule (T) 
but also statements further away from H because rules (Lt)—(R*) can transmit 
requirements on underlying influence experiments along the horizontal axis. 


6 Conclusion 


We presented a simple language for statements about the behaviour of functions 
in a collection that can be interpreted as a way that different entities influence 
one another. We gave it a formal semantics and devised a proof calculus to char- 
acterise the (uncountable) notion of logical consequence that is generally sound 
and complete for a large class of schemes that covers typical cases occurring in 
the formal modelling of experimental setups from natural science classes. 

It remains to be seen whether the calculus can be extended logically (by fur- 
ther rules for instance) to completely capture a larger class of influence schemes. 

Future work will also comprise a number of extensions of the calculus for the 
purpose of obtaining higher expressiveness. Some experimental setups are inher- 
ently temporal in the sense that the influence which a asserts on b depends on a 
value range of a and a point in time, as in “ Yeast grows at temperatures between 
15 and 40° during the next five minutes.” We have made a proposal to incorpo- 
rate time in [4]. It also incorporates the ability to make refined assertions about 
the behaviour of an influence, as in “ Voltage increase is at most 65.4 Vmsec~'.” 
This replaces the abstract behaviours 7 etc. by intervals like [0, 65.4], and the 
geometric interpretation of a statement becomes a trapezoid. 

Formal statements could also include a third interval denoting time points, 
and influence experiments become collections of binary real-valued functions 
which interpret cuboids in three-dimensional real spaces. This would also be an 
approach to model the combined effect of several variables on another variable, 
even if the modeling of time as a special variable is not desired. 


Acknowledgement. We thank Shahla Rasulzade for discussions that have led to this 
work, and for suggesting to study a temporal extension thereof. 


1 https://github.com/SoerenMoeller/influence_ solver. 
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Abstract. We present a theory of Cartesian arrays, which are multi- 
dimensional arrays with support for the projection of arrays to sub- 
arrays, as well as for updating sub-arrays. The resulting logic is an 
extension of Combinatorial Array Logic (CAL) and is motivated by the 
analysis of quantum circuits: using projection, we can succinctly encode 
the semantics of quantum gates as quantifier-free formulas and verify 
the end-to-end correctness of quantum circuits. Since the logic is expres- 
sive enough to represent quantum circuits succinctly, it necessarily has a 
high complexity; as we show, it suffices to encode the k-color problem of a 
graph under a succinct circuit representation, an NEXPTIME-complete 
problem. We present an NEXPTIME decision procedure for the logic and 
report on preliminary experiments with the analysis of quantum circuits 
using this decision procedure. 


1 Introduction 


There has been extensive research on logics to reason about array data-types in 
programs. Arrays can concisely represent the values of an unbounded number of 
memory locations, and have been successfully applied to verify industrial-scale 
programs [11,15,29]. An array formula encoding the semantics of a program path 
is typically linear in the number of program statements. Much of the existing 
work focuses on one-dimensional arrays and uses nesting to handle the case of 
multiple dimensions. 

This paper studies a logic called Cartesian Array 
Logic (CaAL), in which multi-dimensional arrays are 100000) 69% 
treated as first-class citizens. The motivation for 
designing this logic comes from developing a tailor- |00001) 1% 
made theory for reasoning about quantum circuits or wes 
programs, which need a fundamentally different rep- |11111) 1% 
resentation of states than classical programs. Quan- 
tum states exist in a superposition of classical states. 
Figure 1 gives an example of a 5-qubit quantum state, Fig. 1. A quantum state. 


© The Author(s) 2023 
B. Pientka and C. Tinelli (Eds.): CADE 2023, LNAI 14132, pp. 170-189, 2023. 
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which can be interpreted as a probability distribution over 2° classical states; 
every classical state, which can be seen as a string of n bits, is associated with 
a probability of being observed. 

Current SMT-based solutions for reasoning about quantum programs [3] 
encode program paths to a Satisfiability Modulo Theories (SMT) formula over 
the theory of real numbers. For a n-qubit quantum program, the direct encoding 
uses 2” variables to represent the execution of a quantum circuit, one variable 
per classical state. The formula representing a quantum circuit is exponential in 
the circuit size. 

In the Cartesian Array Logic designed in this paper, one can instead encode 
an n-qubit quantum state as an array s : (B” = C) that maps each classical 
state to a complex number c encoding the probability of this classical state being 
observed. The squared absolute value |c|? is the probability that the complex 
number c encodes. Quantum gates, the basic operating units of a quantum circuit, 
can be viewed as functions that transform one quantum state to another. We 
show that CaAL can concisely encode the semantics of quantum gates, so that 
a path formula becomes linear in the circuit size. The semantics of a quantum 
circuit is the composition of the gate encodings. 


Structure of the Paper. The syntax and formal semantics of the CaAL logic 
will be given in Sect. 2. In the same section, we show that this logic is quite 
expressive, it can easily encode the satisfiability problem of a quantified Boolean 
formula (QBF). We show that deciding the logic is, in fact, NEXPTIME-hard 
by a polynomial reduction from the k-color problem of a succinct circuit repre- 
sentation of graphs [23]. As an application, in Sect. 3, we show that the logic can 
concisely encode the semantics of quantum circuits, using B” as the index type 
and C as the value type. In Sect. 4, we present a decision procedure for CaAL, 
extending the classical approach of read-over-write propagation used for arrays. 
In the worst case, our procedure might perform an exponential number of such 
propagations; hence, if the underlying logic can be decided in NP, our logic can 
be decided in NEXPTIME. The preliminary experimental results of applying 
this decision for quantum circuit verification can be found in Sect. 5. 


Contributions of the paper are (i) a new array logic, CaAL, with native support 
for multi-dimensional arrays; (ii) the proof the satisfiability problem of CaAL is 
NEXPTIME-hard; (iii) a linear encoding of the semantics of quantum circuits in 
CaAL; (iv) an NEXPTIME decision procedure for CaAL without nested array 
sorts; and (v) a preliminary evaluation of our approach using standard quantum 
circuits. 


Related Work on Verification of Quantum Circuits. Although quantum states 
can be naturally represented as arrays, the connection between array theories 
and quantum circuit verification is novel, to the best of our knowledge. In the 
past, people have considered automated quantum circuit verification based on 
automata [7], various types of equivalence checking [1,9,19,33], abstract inter- 
pretation [24,34], and model checking [13,21,32]. However, techniques based on 
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satisfiability modulo theories (SMT) are still lacking. The closest work to ours 
is a symbolic execution and verification framework of quantum circuits [3]. The 
work encodes quantum circuit verification problems into SMT with the theory of 
real numbers, using variables in trigonometric functions, e.g., sin x, which might 
lose precision in corner cases. As mentioned, their approach requires 2” variables 
to encode a n-qubit circuit in the worst case. As far as we know, our work is 
the first SMT-based approach that allows a precise and succinct encoding and 
verification of quantum circuits. 


Related Work on Array Theories. There is a large body of research on array 
decision procedures for SMT, going back to the 1980s, and most SMT solvers 
implement at least the theory of extensional arrays (with operations read and 
write /store) in our paper, as standardized in SMT-LIB [2]. Stump et al. [29] 
presented a decision procedure for this theory and formed the basis for many later 
procedures. An extension of the theory, called Combinatorial Array Logic (CAL), 
with functions for constant arrays and for the point-wise extension of functions 
was presented by De Moura et al. [11]. CAL served as the main inspiration for 
our work and is in this paper extended further by adding projections and updates 
of sub-arrays. An extension of CAL with cardinality constraints was presented 
by Raya et al. [25]. Christ et al. [8] present an algorithm for the theory of arrays 
where lemmas are created lazily based on weak equivalences; this method was 
later extended to handle constant arrays [20]. 

There are also many more generalized decision procedures for arrays. For 
instance, Ganesh et al. [16] focus on the combined theory of arrays and bit- 
vectors and present a decision procedure based on pre-processing, bit-blasting, 
and linear arithmetic solving. Brummayer et al. present a decision procedure for 
the same theory that introduces lemmas lazily, guided by congruence closure [6]. 
An extended array theory tailored to software, including operations memset and 
memcpy, was presented by Falke et al. [12]. More recently, several theories of 
finite arrays were proposed. Bonacina et al. [5] extend the standard theory of 
arrays with an abstract notion of length, and present a decision procedure based 
on the CDSAT framework. Wang et al. [31] consider a logic extending CAL with 
a length function, as well as operations for concatenation, slicing, and repetition 
of arrays, and identify a decidable fragment. Sheng et al. [27] propose a theory 
of sequences that combines the standard array operations with a length func- 
tion, concatenation, and slicing. All those logics cannot directly encode quantum 
circuits in a similar style as CaAL, however, since no projection operation is 
available. 


2 <A Theory of Cartesian Arrays 


2.1 Preliminaries 


We work in the setting of multi-sorted first-order logic with equality; see, e.g., 
[18]. A signature is a tuple X = (X5, XF, XP) consisting of a set X’ of sorts, 
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a set XF of function symbols, and a set X? of predicates. Predicates and func- 
tions have fixed arity and argument sorts, and functions have a fixed result 
sort. Given a signature X and a set ¥ of sorted variables, we define the usual 
notions of X-terms, X-atoms, »/-literals, X-formulas, and X’-sentences. Formulas 
are evaluated over ¥-structures M = (D,J) that interpret every sort o € X9 
as a non-empty domain I(c) C D, predicates p € X? as relations I(p), and 
functions f € XF as set-theoretical functions (f). We slightly abuse notation; 
we assume that also variables x € XY are mapped to values I(x) by M. The 
evaluation of terms, formulas, etc., is defined as is common; the equality sym- 
bol = is assumed to be interpreted as the equality relation on D. A theory T 
over X is a set of 3/-sentences. A X-formula ¢ is called T-satisfiable if there is a 
5/-structure M satisfying both the T-axioms and ¢. 


2.2 Definition of the Theory of Cartesian Arrays 


Cartesian arrays are introduced in the context of a base signature Xg and a 
base Xpg-theory Tg, which provides the index and value sorts for arrays. The 
signature NcaaL = ae as Dei) of CaAL is then defined as follows. 
The set of sorts is the least set Xé ar such that (i) YZ C X ar, and (ii) 
o, T € Xaar and n € Nyo imply (o” > 7) € Xé ar- A sort (o” > 7) is an 
array sort of arity n with index sort o and value sort T. 


Table 1. Operations included in YE, ay, for each sort (o” > 7). 


[eee] i (o" =T) xo” rT Reading of array values 

store: (o” > T)xo" xT —> (o” => T) | Updating of array values 

K:t—3(o"=7) Construction of constant arrays 

map, :(o" => 71) X +++ xX (o" > Point-wise extension of base 

Tk) > (o” > 7) function f : T1 X++ X Tk >T 

proj, : (0 => T) X o —> (ot >r) |Forn > 1 and k€ {1,... n}, 
projection to n — 1 of the 
indexes 

arrayStore, :(o0” => For n > 1 and k € {1,...,n}, 

T) X o x (o"t > T) 3 (0" > 7) update of a sub-array 


The set XE ar, includes X$, as well as the operations listed in Table 1 for 
every array sort (o” = 7T). The operators -[-,...,-] and store are the functions 
for reading from and writing to arrays, as in the standard theory of arrays. K 
and map, correspond to the functions introduced in CAL [11]; in particular, 
any base function f € X$ is lifted to an operator on arrays using map f- The 
operators proj and arrayStore are specific to our theory CaAL, and can be 
used to project an n-dimensional array to an (n — 1)-dimensional sub-array by 
fixing the value of the k’th index, and to update the corresponding portion of 
the original array, respectively. The set X,ar, coincides with XE. Semantics is 
defined by the axiom schemata in Table 2. 
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Example 1. We illustrate the use of two-dimension arrays s,s’ : (B? > C) 
to encode two-qubit quantum states. Suppose that s represents the state 
3 (00) +|11)), and s’ = Xə(s) is the quantum state after applying an X gate 
(the quantum version of a “not”-gate) on the 2nd qubit of s. The matrix repre- 
sentations of s and s’ are as follows; note that the results of z2 = 0 and z2 = 1 
are swapped in s and s’. 


The projection proj,(s,k) maps the matrix s to its k’th column vector, 
specifically the column with zı = k. In CaAL, we can construct s’ from s 
as s’ = arrayStore,(arrayStore,(K(0), 1, projz(s,0)), 0, proj(s,1)). To compute 
the sum of the two matrices, we use map,(s, s’), which is also utilized for other 
quantum gate operations. 


Several extensions of the theory of Cartesian arrays are possible but beyond 
the scope of this paper. Those include (i) arrays with multiple different index 
sorts, as opposed to just n copies of the same index sort g; and (ii) a theory that 
also includes point-wise extensions of predicates. 


Table 2. Axioms of the Theory of Cartesian Arrays. As shorthand notation, we write 


i:o” for a vector of n index variables 71: 0,...,in: 0. 


Va: (o" > T),t:0",@:7. 
store(a,i,x)[i] =x 
Va : (o” >T),i:0",9:0",@:T. 


2 (2) 


i= j V store(a,i,x)[j] = alj] 


Va:T,i:0”. 
? z 4 
K(x)[i] =< (4) 
Var: (o" => 71),..-,@n 2 (0" > Tr), i: 0”. (5) 
map , (ai, ,ax)[t] = f(ai[2], „ar[i]) 


proj, (a, iķ)[ii,..-,ik-1,ik+1;-- -s in] = afi] 
Va : (o" => 7T), b: ("71 > T), i: o”. 
arrayStore, (a, ik, b)[i] = bli1,...,%h-1, th41,-++5%n] 


Va : (o" => 7T), b: ("71 > T), i:o”, j:0. 


j = ik V arrayStore, (a, j,b)[i] = ali] 
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2.3 Complexity of Satisfiability in CaAL 


We now study the hardness of satisfiability of quantifier-free CaAL formu- 
las. The quantified Boolean formula problem (QBF) generalizes the Boolean 
satisfiability problem by allowing existential and universal quantifiers to be 
applied to variables. Its satisfiability problem is PSPACE-complete [28]. With- 
out loss of generality, we can assume that QBF formulas are in prenex nor- 
mal form Q121.Q2%2.:++QnXtn.¢, which consists of a Boolean formula ¢ over n 
Boolean variables z1,..., £n, and a prefix of quantifiers Q1,Q2,...,Qn E€ {V, 3}. 

To reduce the satishability problem of QBF to CaAL, we assume that the 
base theory provides a sort B with the standard operations. This sort will be 
used for both index and values. An array toCaAL(¢) : (B” => B) encoding the 
semantics of ¢ is defined recursively as follows: 


— toCaAL(z,) = arrayStore,(K(0),1, K(1)). 
— toCaAL(7¢) = map_(toCaAL(¢)). 
— toCaAL(¢1 A 2) = map, (toCaAL(¢1), toCaAL(¢2)). 


Observe that arrayStore;( (0), 1, K(1))[i1,..-,¢%,---,%n] = ik, and note that 
the size of toCaAL(@) is linear in the size of ¢. We can construct a CaAL formula 
that is equisatisfiable with Q1271.---Qn2n.@ as follows: 


QElim(Q121. ++ Qnan-¢) = 


(a [0] ©1 ai [1]) A A ai-1 = mapo, (proj ;(qi,0), proj ;(qi, 1)) A dn = toCaAL(¢) 
1=2 


where ©; = A when Q; = V, and ©; = V otherwise. Note that the 
QBF formula Q121.---Qn@n-¢ is valid if and only if the CaAL formula 
QElim(Q121.- -- Qn2n-d) is satisfiable. 


Theorem 1. The satisfiability problem of CaAL over B is PSPACE-hard. 


This lower bound can be improved, however. The k-colorability problem for 
graphs with succinct circuit representation is NEXPTIME-complete [23]. This 
problem can be reduced to the satisfiability problem of CaAL in polynomial time 
as well. 

Consider an undirected graph with 2” nodes, and let $(%,x’) be a Boolean 
circuit encoding the edge relation of the graph: (z, x’) evaluates to true when- 
ever there is an edge (Z7) — (x’) in the graph. The k-colorability of the graph 
can be characterized as the following formula, where c : (B” — N) is an array 
representing the color of each node: 


Va, x' : B”. $(%, 2’) — cz] 4 cfe] Ac[z] < k^ clr] <k. 


In a similar way as for QBF, we encode ¢ as an array formula ¢’ of linear size, 
in which ag : (B” x B” => B) is an array variable representing the edge relation. 
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We then create two intermediate arrays a, b : (B" xB” = N) and use the following 
formula in CaAL to encode the relation VZ, x’ : B”. a[%, x] = c[Z]Ab|Z, x’] = cla’): 


EqColor(a, b,c) = 


n 
4=4,\c=ao A VAN projj4+n(aj,0) = projj+n (aj, 1) = aj-1 A 
j=1 


b= bn NES bo A \ proj;(b;,0) = proj;(bj,1) = bj—ı 
j=l 


Then we encode the k-color problem with the following CaAL formula: 


$” ^ EqColor(a, b, c) A map (ag, a,b) = K(1) 
where f(e, coll, col?) = e— (coll 4 col? ^ coll < k ^ col? < k). 


Theorem 2. The satisfiability problem of CaAL is NEXPTIME-hard. 


3 Array Semantics of Quantum Circuits 


As an application, we show that CaAL can encode the semantics of quantum 
circuits. Below, we only give a short overview of quantum circuits and define 
notations; for more details, see, e.g., the textbook of Nielsen and Chuang [22]. 

In a n-qubit quantum, a state is a superposition of computational basis states 
{|j} | j € {0,1}"}. For example, for a system with three qubits z1, £2, and z3, 
the computational basis state |101} (in Dirac notation) denotes a state in which 
both zı and z are set to 1, and zə is set to 0. A n-qubit quantum state s is then 
denoted as a formal sum je{o,1}" cj |g), where co, c1, ..-,C2n—1 € C are com- 
plex probability amplitudes satisfying the constraint that J jeto lg? = 1. 
Intuitively, |cj|? is the probability that when we measure the quantum state 
s in the computational basis, we obtain the basis state |j}. The constraint 
J je{0,1}" |c;j|? = 1 states that probabilities need to sum up to 1 for all compu- 
tational basis states. 

We can record a quantum state as an array that maps a computational basis 
state to its complex probability amplitudes. The state s is represented as an 
array s : (B” = C) satisfying s[j] = cj for all j € {0,1}”; slightly abusing 
notation, we denote both the state and the array by s. 
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3.1 Quantum Circuits 


A quantum circuit consists of a sequence of 


quantum gates. Each quantum gate defines |1) 
a specific transformation of quantum states. 
For example, the Pauli-X gate (the quantum |r) 


version of classical “not” gate) on the k-th 
qubit transforms a state s to s’ satisfying Vi €E _. ae Ea 
{0, 1}k-1 b € {0,1},7 € {0, 1}r-k : s![ibj] = Fig. 2. The EPR circuit, consisting 
slibj], i.e., it negates the k-th index bit. ofan H and a. (AX gare with control 
Another example is the Pauli-Z gate on e ad apt E: 
the k-th qubit, which transforms a state s 
to s’ satisfying Vi € {0,1}*-1,b € {0,1},7 € {0,1}"-* : s'[ibj] = ite(b,-1- 
s|ibj], s[ibj]). Here, probability amplitudes are multiplied with —1 when b is 1, 
and are unchanged otherwise. 
A H gate, or Hadamard gate, on the k-th qubit transforms a state s to s’ 
satisfying Vi € {0,1}*—!,b € {0,1},7 € {0,1}"-* : 


s[i0j] — s[i1j] s[i0j] + sļilj] 

v2 * v2 
Notice that the amplitude of a basis state of s’ is affected by the amplitude of 
two basis states of s, enabling a more diverse superposition. The division with 
V2 is for normalizing the probability sum. 

A more advanced class of gates are multiple-qubit gates. The CX gate 
(“controlled-X”) on the control qubit c and target qubit t applies an X gate to t 
when c is 1, and is identity otherwise. Formally, assuming c < t, the gate trans- 
forms a state s to s’ satisfying Vi; € {0,1}°',b. € {0,1}, i2 € {0,1} 1, € 
{0,1}, i3 € {0,1}"': 


s'[ibj] = ite(b, ). 


s'|i1beizbriz] = ite(be, slir bei2bris], s[i1bcizbris]). 


The Toffoli gate COX (“controlled-controlled-X gate”) has two control qubit c, 
d and applies the X gate to the target qubit t only when c= d= 1. 

We have introduced enough quantum gates to define the EPR circuit (Fig. 2), 
named after Einstein, Podolsky, and Rosen for constructing the Bell state, i.e., 
a 2-qubit circuit converting a basis state |00) to a maximally entangled state 
-z (100) + |11)). Starting from a state s (represented s that maps 00 to 1 and 
others to 0, the circuit first applies H on the first qubit xı (denoted H; in this 
paper) to produce the quantum state s’ with s’[00] = s’[10] = Wi and s’[11] = 
s'[01] = 0. Then a CX, 2 converts it further to s” with s”[00] = s”[11] = z 
and s” [01] = s”[10] = 0. Notice that CX1,2 converts |10} to |11}, i.e., when zı 
is 1, it negates z2. 


Note on Complexity. Simulation of a quantum circuit is bounded-error quantum 
polynomial time (BQP) hard, a complexity class that is incomparable with NP, 
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Table 3. Semantics of quantum gates in Cartesian array logic. We use s and s’ to 
denote the quantum state before and after executing the circuit. 


Gate Formula 
Xk proj ,,(s’,0) = proj,(s,1) A 
proj;,(s',1) = proj;,(s,0) 
Yk proj,,(s’,0) = map, (— w2) Proj p(s, 1) ^ 
proj,(s’,1) = MAP, (w2) PTOJ , (8,0) 
Zk proj p(s", 0) = proj p(s, 0) A 
proj ;,(s’,1) = map, (1) Proj, (s, 1) 
Sk proją(s',0) = proj;,(s,0) A 
projp(s', 1) = map, (u2) proj;,(s, 1) 
Tk proj;,(s',0) = proj (s, 0) A 
proj ,,(s’,1) = map,.,) proj ,,(s, 1) 
Ay proj ;,(s',0) = Map 4. y/vz( Proj, (S, 0), proj;,(s,1)) A 
proj),(s’, 1) = map _ J/va(Proj p(s, 0), proj;,(s, 1)) 
Ra(Z)x | proj, (s’,0) = map; 4 (—2)«.)/v3(PTOI x (8,0), proj, (s,1)) A 
projp(s', 1) = map ((_.32)«.4.)/v3(PTOI 4 (8, 0), proj ;,(s, 1)) 
Ry(Z)k | proj;,(s’,0) = map; _),ya(proj;,(s, 0), proj, (s, 1)) A 
(s ( 


proj,(s',1) = Map 4. taval P s,0), proj,,(s, 1)) 
CXe, proj .(s’,0) = proj a(s, 0) A 

proj;(proje(s’, 1), a) N (s,1),1)A 

proj ,(proj,(s’, 1), 1) = proj, (proj .(s, 1), 0) 

(s’,0) = proje(s, 0) A 

proj ,(proj .(s’, 1),0) = proj, (proj (s, 1), 0)^ 

proj ,(proj,,(s’,1),1) = map.,,(_1) Proj, (proj (s, 1), 1) 


C Zet proje 


CC Xea, | proje(s', 0) = proj.(s,0) A 
proja(s',0) = proj a(s, 0) A 
proj;(proja(proj(s', 1), 1),0) = proj(proja(proje(s, 1), 1), 1) A 
proj (proj a(proj .(s', 1), 1), 1) = proj(proja(proj.(s, 1), 1),0) 


as it can compute exactly the probability amplitudes of a quantum state after 
executing a circuit. We will show that the Cartesian array logic can encode the 
semantics of quantum circuits, so one can also use the logic for quantum circuit 
simulation. Hence, exponential time is the best deterministic algorithm we can 
hope for when solving CaAL formulas. 


3.2 Interpretation of Quantum Gates 


We show the encoding of quantum gates in CaAL in Table 3. Notice that this 
gate set includes several universal gates (e.g., H, CX, and T [10]) that can 
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approximate any quantum gate to an arbitrary precision requirement. Arbitrary 
degree rotation can also be supported using the theory of reals as the base theory. 
This paper presents a precise encoding that only requires a theory of integers. 
In the figure, we use s and s’ to denote the quantum states (encoded as arrays) 
before and after executing a quantum gate. To encode s’ = X;,(s), negating the 
k-th qubit, we use proj,(s’,0) = proj,(s,1) A proj,(s’,1) = proj,(s,0): index 
k = 0 in s’ equals the case of k = 1 in s. The handling of Z, S, and T gates 
is similar, using the map function to multiply the ay values with oem 
constants. Note that here we use w to represent e 7 = cos T+isinĵ = = 74 Wed 


the unit vector that is at an angle of 45° to the positive real axis in cn complex 
plane. Later we will show that this representation allows a precise algebraic 
representation of complex numbers using a five-tuple of integers. Observe that 
wt = —1. The Y gate combines the two constructions; it negates the k-th index 
qubit and multiplies each projection with different constant coefficients. For 
the H, Ra(4), and Ry(5) gates, we use a binary map function to update the 
amplitudes. For the controlled gates, we use the projection function to classify 
the cases according to the control bits and apply the X or Z gate only when all 
controlled bits are 1. 


Example 2. We use CaAL to verify the correctness of the EPR circuit Fig. 2: the 
circuit transforms the state |00) to 3 (00) + |11)). For this, the initial state of 
the circuit is encoded as an array expression, the H and CX gates are encoded 
according to Table 3, and the intended final state of the circuit is represented as 
a negated equation: 


so = store(K (0), (0,0), 1) 


proj, (81,0) = map; 4 y/va(prej1(so, 0:0), proj, (so, 1)) | ned 
projı(s1,1) = Map _)/ya( Proj (so ,0), proj, (so, 1)) 

proj (82,0) = eoo 

proja(projı(s2,1),0) = proja(projı (sı, 1), 1) | 
proja(projı(s2,1), 1) = proja(projı (sı, 1),0) 

1 


1 
ya” , (0,0), Fi 


The formula is unsatisfiable if and only if the EPR circuit correctly performs 
the transformation. 


> > > > > 


A s2 # store(store(k(0), (1,1), —) 


Representation of Complex Numbers. To achieve accuracy with no loss of pre- 
cision, in this paper, when working with C, we use a subset of the complex 
numbers that the following algebraic encoding can express (cf. [7,30,35]): 


( : )Ka + bw + cw? + dw), (9) 
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Table 4. Tableau proof rules of the decision procedure for CaAL. 


a = store(b, i, v) a= K(v) w=a' [i] a~ a 
idx — K4 
v = aji] v=w 
a = store(b, i, v) w = a'[j] ana 
store J} —— = 
i=j | w= b[j] 
a = store(b, i, v) w = b'[j] bab 
store f —— = 
i=] | woah 
a = maps (b1,-..-, bm) w=a' [i] ana’ 
map 4 - . 
w = f(bifi],... , bm [i]) 
a = map s(bi,...,bm) w = b'[i] b' ~ by for some k € {1,...,m} 
aft] = f(bi[i], ---, br-1[i], w, b+ fi], -- - , bm [å]) 
. a= projk(b, j) w = a' fi] ana 
proj 4 — 
w = blit,...,th-1, 9, tk, -+-5in-1] 
, a = projr(b, j) w = b' [i] b~a b 
proj f - E TF : 
jH# ix | w = alii, iz,- -;ik-1;ikt1; 31n] 
a = arrayStore,(b, j, c) w =a’ [il ana 
arrayStore |) —— aad EI = - 
j= ik Aw S= clir; ikai kpiye] | j Æ ik ^w = di] 
a = arrayStore,,(b, j, c) w = di b~ b 
arrayStore fı — = 
j=ik | w = afi] 
a = arrayStore,(b, j, c) w=c [i exe 
arrayStore ff2 5 : ae : 
w = alii,...,tn-1,J, kysi inzi] 
: a:(o" =>?) b: (o >T) pecs t1,.--, thio 
ex = ~ = = aE eae 
a=b | Ji : o”. alt] A bli] esx Ajo. jAi N Aj E ik 
a: (o >T i:o” =ali = blj wih 
read a ) = readCong 7 akl — = bij] 2 
Jv : T. v = afi] t#j | i=jAv=w 


where a,b,c,d,k € Z. A complex number is then represented by a five-tuple 
(a,b,c,d, k). Although the considered set of numbers is only a small subset of C, 
it is closed under the operations needed to encode quantum gates, and it can arbi- 
trarily closely approximate any complex number. For this, note that (a, 0, c, 0, k) 
represents 7 (a + cw?) = TF + aE and pick suitable a, c, and k. The repre- 
sentation is also sufficient to describe a set of quantum gates that can implement 
universal quantum computation (Table 3). 
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4 A Decision Procedure for Cartesian Arrays 


We now present a decision procedure for quantifier-free CaAL. Our calculus is an 
extension of the calculus for CAL [11] with rules for the proj and arrayStore oper- 
ations. For the sake of presentation, we use the setting of analytic tableaux [14], 
although the same proof rules can be used also in a model-constructing calcu- 
lus [11]. 

As a simplifying assumption, in this section we furthermore require that 
the index sorts ø of an array sort (o” = T) represent infinite domains. This 
assumption can be lifted in the same way as for CAL [11], but the details are 
orthogonal to the task of supporting the new array operations. 


4.1 Preliminaries 


A tableau [14] is a finite tree growing downwards, in which each node is labelled 
with a formula, the root is labelled with the formula to be refuted, and the 
children of each node are derived from the formulas on the branch leading to 
the node using one of the available proof rules. We assume a tableau calculus 
equipped with a set of standard rules [14]: (i) a- and -rules for eliminating 
Boolean connectives A,V; (ii) 6-rules for eliminating existential quantifiers 3; 
(iii) rules for reasoning about positive and negative equalities x = y between 
variables, which include rules for closing proof branches; (iv) rules implementing 
a decision procedure for the base theory Tp. 

Our calculus operates on flat formulas, which are formulas in which func- 
tions f only occur in equations y = f(z) in positive positions, i.e., underneath 
an even number of negations, with y, z being variables. Every formula can be 
converted to a flat formula by introducing a linear number of new variables. 

We define proof rules using the following notation: 


rule ia $2 Se bk 
pı | M | Ym 
The rule is applicable if the premises ¢1,..., px occur on a proof branch, and 
has the effect of expanding the tableau: the proof branch is split into m new 
branches, to which the formulas Y1,..., Wm, respectively, are appended. 


In the premises of a rule, we frequently include assumptions x ~ y that 
require that the equality x = y follows from positive equalities between variables 
on the proof branch. We also use premises xz: o, stating that x is a variable of 
sort o occurring on the proof branch. 


4.2 Proof Rules 


The rules of our calculus are shown in Table 4. The rules idx, KJ), store), store ff, 
map}, map{} coincide with the rules used for CAL [11], and define the semantics 
of the operators K, store, and map. Extensionality is implemented by the rule ext, 
which can be applied for any two array variables a,b of the same type occurring 
on a branch. 
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The semantics of proj and arrayStore is defined, in a similar way as for store, 
by upward and downward propagation of array reads. Since arrayStore,,(b, j, c) 
combines two arrays b,c into a single new array, downward propagation has to 
route reads either to b or to c. Upward propagation from c is always possible, 
while reads on b can only be propagated if they are not overwritten by c. 

For sake of presentation, we write the conclusion in the rules map 4}, map ff, 
and ext in non-flat form, and assume that the transformation to a flat formula 
happens implicitly by adding existentially quantified variables representing the 
sub-terms. 

Congruence reasoning is necessary only for array reads, and implemented 
using the rule readConq. For simplicity, in our formulation the rule splits over 
the cases i 4 j and i = j, and effectively searches for an arrangement of the 
index variables satisfying a formula. An actual implementation could rely on 
equality propagation being performed by a theory combination procedure. 

As one of the more tricky points, the completeness of the calculus sometimes 
requires new array reads to be generated. This aspect is covered by the rules ey 
and eô in CAL [11], which are rules that can, however, not directly be used in 
our setting of multi-dimensional arrays. To obtain completeness, our calculus 
sometimes has to construct reads by combining different index variables occur- 
ring on a branch, and sometimes invent index values that are distinct from all 
indexes occurring in a formula. The introduction of corresponding new reads is 
handled by the rules freshldx and read. 


Example 3. Consider arrays a,b: (Z? = Z), and the formulas 


proj (a, i) = K(42) A proja(a, j) = K(43) (10) 
a = K(42) A b = store(a, (i, i), 43) A proj, (b, i) = K(43) (11) 


Both formulas are unsatisfiable, but cannot be refuted using the rules discussed 
so far. In (10), no reads aJ---] exist, so that no propagations can be performed 
by any of the rules. It is necessary to identify the constraints on the value afi, j] 
as contradictory. The rule read can be used to introduce a new formula dv. v = 
ali, j| on a proof branch, after which the rules proj f and K4} can be applied. 

To show that (11) is unsatisfiable, we need to consider a point (i,7) with 
j Ai and derive that ali, 7] = bli, j] = 42, and contradicting proj, (b, i) = K(43). 
The introduction of a fresh index value j (different from i) is handled by the 
rule freshldx, which relies on the index sort o representing an infinite domain. 
Once the existence of an index j # i has been asserted, the rule read can be used 
to introduce an equation v = ali, j], and the contraction be derived. 


4.3 Correctness and Complexity 


Theorem 3. The presented tableau calculus is sound and complete for flat 
quantifier-free CaAL formulas: there is a closed tableau for a formula @ if and 
only if ọ is unsatisfiable. 
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Proof. Soundness: As usual, we identify each proof branch with the conjunction 
of its formulas and a tableau with the disjunction of its proof branches. It can 
be shown that the tableau before expansion using a proof rule is equi-satisfiable 
to the tableau before the expansion, modulo the array axioms in Table 2. 

Completeness: We make the simplifying assumption that @ only contains 
arrays with (infinite) index sort ø and value sort 7, and in particular that array 
sorts are not nested. Completeness for the general case follows by recursively 
applying model construction. 

Consider then the systematic construction of a tableau for a formula ¢ by 
exhaustively applying proof rules under the following restrictions: (i) regularity, 
i.e., rules are only applied if they lead to new formulas being added to each 
generated branch; (ii) rule freshldx can only be applied once on a branch, only 
after ext has been applied to all pairs a,b of array variables on the branch, and 
choosing 71,...,7% as the set of all variables of sort ø on the branch. 

Observe that this systematic application of rules terminates: the calculus 
never introduces new array variables so that only finitely many applications 
of ext are possible. Note that ext and freshldx are the only rules introducing 
new index variables. Since freshldx is applied at most once on a branch, the set 
of index variables is bounded, and there is only a bounded number of array 
reads v = ali]. 

Assume now that a tableau for ¢ cannot be closed, i.e., has at least one 
branch B that cannot be closed, although all possible rule applications have 
been performed. We extract a model of ¢ from B. Suppose that Mr = (Dr, Ir) 
is a model that interprets the non-array-variables (including index variables), 
satisfying all literals on B that do not contain array variables, and denote the 
equivalence class of an array variable a on B by [a] = {b | a ~ b}. Extending Ir, 
we construct an interpretation I with I((o” => 7)) = Ir(o)” — Ip(r) being a 
function space, and the theory functions -[-], store, K, map s, proj and arrayStore 
having their expected meaning. J is constructed in such a way that all array 
literals on B are satisfied; the satisfaction of compound formulas on B, and in 
particular of ¢, then follows like in the standard Hintikka construction [14]. 

The interpretation I(a) of an array variable a : (o” = 7T) is derived from 
the array reads on [a] occurring on B. The main difficulty is to consistently 
interpret the (infinitely many) elements of the array that are not mentioned 
explicitly on B. For this, denote the index variable introduced by the unique 
freshldx application on B by €, and observe that its value [7(e) is distinct from 
the value of all other index variables. We will use values read from I7(€)-locations 
as default values for the arrays. Let 


Ra = {(Ur(41),---, Pr(in)), Ir(v)) | v = bfi] occurs on B and a ~ b} 


be the set of array reads for a: (o” = 7). The relation Ra describes a non-empty, 
consistent (but partial) valuation of the array elements, due to the exhaustive 
application of rules read and readCong. 

The gaps in Ra will be filled with default values introduced by e. For this, we 
define a precedence ordering < C Ir(o)* x Ir(o)* over index vectors; intuitively, 
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é < dif Gand d agree in all components, unless dẹ = Ip(e), which is interpreted 
as don’t-care: 


(C1,--+,Ck) S (dy,...,dm) iff k = m and Vi € {1,...,k}: ci = di V di = I7(6) 
The value of array variable I(a) € I((o™ = T)) is then: 


I( \= (z ) | (d, x) E Ra, where Cx d = _ = 
4) = \\@7) 1 and for all (d’, 2’) € Ry: if €< d then d < d 


To see that (a) is functionally consistent, note that whenever (d, x) and (d’, x”) 
exist in Ra such that Z < d and Z < d’, then there is also some (d", x”) € Ra such 
that Z < d! < d,d'. This is because the rule read has been applied exhaustively. 

It remains to be shown that J satisfies all array literals. By construction, 
equations a = b will be satisfied. To see that equations v = ali] hold, note that 
I(a) D Ra. Equations a ¥ b are satisfied due to the exhaustive application of 
ext: there has to be some vector i of index variables such that afi] 4 bfi]. 

All other array literals are positive equations of the form « = f(9), 
and hold because exhaustive propagation of read atoms was performed. As 
an example, consider an equation a = proj,(b,j); it has to be shown that 
I(a) = {((c1,.--,Ck—1,Ck41,-+-,€n),%) | (£) € I(b),ck = Ir(j)}. Observe 
that Ra = {((c1,.--,Ck—1, Chti;+-+;Cn);%) | (x£) E€ Ro, ck = Ir(j)} due to the 
rules proj {| and proj ft. Consider then a point (2, x) € I(a), defined by (d, £) € 
Ra, and the corresponding index vectors @ = (c1,...,Ck—1, 17 (J), Ck, +--+, €n—1) 
and ď = (d,,...,dx—1,I7(j),dx,---,dn—1) in Ry, and show that (2,2) € I(b) is 
defined by (d', x) € Ry. 


The proof of the theorem highlights the restrictions necessary to obtain a 
decision procedure for CaAL: all rules should be applied under the condition of 
regularity, and the rule freshldx has to be restricted to at most one application 
per branch, and only after applications of ext have been performed. 

To evaluate runtime, like in the proof of Theorem 3 we make the assumption 
that there are no nested array sorts, i.e., index and value sorts are themselves 
not arrays. To avoid degenerate cases when evaluating runtime, we assume that 
a formula ¢ cannot be smaller than the maximum arity of occurring array vari- 
ables. We then get: 


Lemma 1. The satisfiability problem of quantifier-free CaAL formulas @ with- 
out nested array sorts is in NEXPTIME, assuming that the satisfiability problem 
of the base theory is in NP. 


Proof. This follows from the proof of Theorem 3. On every branch, the rule ext 
can be applied at most quadratically often, and the number of index variables 
occurring on a branch is polynomial in the size of the input formula ¢. The 
number of distinct read atoms v = ali] that can be introduced on a branch, 
and therefore the number of rule applications altogether is then polynomially 
bounded by the number of variables in ¢, and exponentially bounded in the 
maximum arity of array variables in ¢. After exhaustive application of the rules 
in Table 4, solving an at most exponential number of base theory formulas (with 
at most exponential size) on a branch is in NEXPTIME. 
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4.4 Optimizations 


The calculus and decision procedure are primarily designed with simplicity in 
mind, rather than focusing on practical efficiency. Although the procedure’s com- 
plexity may not be reduced below NEXPTIME, incorporating various optimiza- 
tions can yield significant practical improvements. Two obvious improvements 
to be considered are: (i) The detection of linear array variables, which are 
essentially variables that are assigned to at most once in array literals [11]. It is 
enough to perform upward propagation (rules ft) only for non-linear variables. 
(ii) The restriction of the number of reads introduced using the rule read. 
In practice, only a few of the generated equations are actually needed to ensure 
completeness. Instead of generating all possible reads eagerly, a procedure could 
focus on the other rules first, and only introduce additional reads when it is 
detected that default values are missing for some sub-arrays. We believe that 
other refinements presented in [11] can be carried over to our decision procedure 
as well. 


Table 5. Experimental results. We list the circuit name, the number of qubits and 
gates in the circuit, the verification result, and the execution time. 


circuit qubits | gates | result | time | circuit qubits | gates | result | time 

H? 2 OK |3.1s |H? (bug) 2 bug |3.0s 

BV 3 OK |3.2s | BV (bug) 3 bug | 3.3s 

BV 5 OK 64s | BV 13 OK |1m59.0s 
BV 8 OK  16.8s| BV 15 OK |9m 13s 
BV 10 OK |43.2s|BV 18 OK |50m 54s 


85 OK |51.7s 
OK |3m53s 
9 OK |3.8s 

15 OK | 14.2s 
21 OK | 37.9s 
27 OK |4m 51s 
33 OK |57m 2s 


17 OK | 5.2s | Grovergingle-Comp 
OK 6.8s | Groveran-comp 

9 OK | 3.2s | Groverai-tter 

15 OK |4.9s | Groverai-tter 

21 OK | 8.4s | Grover ai-tter 

27 OK | 17.1s | Grover jitter 

33 OK | 46.9s| Grover ai-tter 


Groversgingle-Comp 


Grover All-Comp 


Groversingle-Iter 


Groversingle-Iter 


Groversingle-Iter 


Groversingle-Iter 


of, Pl wl mle) wm) wml] Pl wlmlRelre 
= 
NI 

a) plwlrmle| A) AlN olalRe}le 
ioe) 
or 


Groversingle-Iter 


5 Preliminary Experimental Result 


We have implemented the decision procedure proposed for CaAL, the encoding 
of quantum gates using array operations, and of complex numbers as five-tuples 
of integers in the SMT solver Princess [26]. The implementation is still a proof 
of concept and largely unoptimized, so that the results reported in this section 
should be considered preliminary. We evaluate the performance of CaAL based 
on a set of benchmarks for quantum circuit verification. All experiments were 
conducted on a server with an AMD EPYC 7742 64-core processor (1.5 GHz), 
1,152 GiB of RAM, and a 1 TB SSD running Ubuntu 20.04.5 LTS but were run 
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with only one core for the sake of fairness. Files to reproduce the experiment 
can be found in https://zenodo.org/record/7970588. The experimental results 
are shown in Table 5. Specifically, we tested four different verification problems 
with different circuit sizes. 


— H?: Two consecutive H gates equal to identity. 

— BV: The (complex) amplitudes of the output quantum state from a Bernstein- 
Vazirani’s [4] circuit have no imaginary parts. 

— Groverxxx-Comp: The Grover’s [17] circuit has a probability of 90% to find 
the correct answer. 

— Groverxxx-tter: Each Grover iteration [17] increases the possibility of finding 
the correct answer. 


For Grover’s algorithm, XXX = Single means we check the correctness of 
the circuit against a specific oracle, and XXX = All means we check against all 
possible oracles. We manually injected two bugs (by altering one gate) into two 
examples to demonstrate bug-catching capability. With a timeout of 60min, our 
implementation can analyze circuits with at most 7 qubits and at most 85 gates, 
which are still relatively small circuits. Analyzing the results, we discovered that, 
in particular, the H gates used to create a superposition state at the beginning 
of a circuit are challenging for the array decision procedure, as they lead to an 
exponential number of array reads being created. 


6 Conclusions 


We have presented CaAL, an expressive logic of extensional arrays, with opera- 
tions for reading and storing values, creating constant arrays, a point-wise exten- 
sion of functions on array values to arrays, projection of arrays, and updating 
array slices. We have established that checking the satisfiability of quantifier- 
free CaAL formulas is NEXPTIME-complete, for a base theory in NP and non- 
nested arrays. The root cause for the complexity of CaAL (as opposed to the 
NP complexity of CAL and the standard theory of arrays) is that formulas can 
be constructed in which a cell in one array has dependencies to an exponential 
number of cells in another array. In our decision procedure, such situations lead 
to an exponential number of reads generated during propagation. High degrees 
of dependency are typical, however, for quantum circuits. 

We believe that CaAL is a suitable framework for reasoning about quantum 
circuits. Due to the expressiveness of the logic, the encoding of quantum gates 
becomes remarkably succinct and elegant (Table3), and easily understandable 
both for researchers in quantum circuit verification and people in automated rea- 
soning. While theoretically optimal, we consider the decision procedure proposed 
for CaAL only as a first step: the high complexity of CaAL implies that brute- 
force approaches like saturation are unlikely to scale to interesting instances. As 
future work, we therefore plan to explore the use of abstraction methods and of 
more succinct array representations in the decision procedure, thus making it 
possible to exploit the highly structured nature of typical quantum circuits in 
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the solving process. We also plan to investigate whether interesting fragments of 
CaAL with lower complexity can be identified. 
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Abstract. Subsumption resolution is an expensive but highly effec- 
tive simplifying inference for first-order saturation theorem provers. We 
present a new SAT-based reasoning technique for subsumption resolu- 
tion, without requiring radical changes to the underlying saturation algo- 
rithm. We implemented our work in the theorem prover VAMPIRE, and 
show that it is noticeably faster than the state of the art. 


1 Introduction 


Saturation-based proof search is a popular approach to first-order theorem prov- 
ing [6,14,18]. In addition to efficient inference systems [1,8], saturation provers 
also implement redundancy elimination to reduce the size of the search space. 
Redundancy elimination deletes clauses from the search space by showing them 
to be logical consequences of other (smaller) clauses, and therefore redundant. 
However, checking whether a first-order formula is implied by another first- 
order formula is undecidable, and so eliminating redundant clauses is in gen- 
eral undecidable too. In practice, saturation systems apply cheaper conditions 
for redundancy elimination, such as removing equational tautologies by con- 
gruence closure or deleting subsumed clauses by establishing multiset inclu- 
sion. Recently, SAT solving has been applied to efficiently detect and remove 
subsumed clauses [10]. We extend SAT-based reasoning in first-order theorem 
proving to a combination of subsumption and resolution, subsumption resolu- 
tion [2] (Sect. 4). 

Both subsumption and subsumption resolution are NP-complete [4]. To 
improve efficiency in practice, we (i) encode subsumption resolution as SAT 
formulas over (match) set constraints (Sect. 5) and (ii) directly integrate CDCL 
SAT solving for checking subsumption resolution in first-order theorem prov- 
ing (Sect. 6). We implement our approach in the theorem prover VAMPIRE [6], 
improving the state-of-the-art in first-order reasoning (Sect. 7). 


Related Work. Subsumption and subsumption resolution are some of the most 
powerful and frequently used redundancy criteria in saturation-based provers. 
Subsumption resolution is supported as contextual literal cutting in [14], along 
with efficient approaches for detecting multiset inclusions among clauses [6, 13, 
18]. Special cases of unit deletion as a by-product of subsumption tests are also 
proposed in [16]. Much attention has been given to refinements of term indexing 
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[13,16] to drastically reduce the set of candidate clauses checked for subsump- 
tion. Recently, these approaches have been complemented by SAT solving [10], 
reducing subsumption checking to SAT. Our work generalises this approach by 
solving for both subsumption and subsumption resolution via SAT. 

SAT solvers have been applied widely to first-order theorem proving, includ- 
ing but not limited to AVATAR [17], instance-based methods [5], heuristic 
grounding [14], global subsumption [12] and combinations thereof [11], but using 
SAT solvers for classical subsumption methods is under-explored. To the best 
of our knowledge, SAT solving for subsumption resolution has so far not been 
addressed in the landscape of automated reasoning. 


2 Illustrative Examples and Main Contributions 


Let us illustrate a few challenges of subsumption resolution, which motivate our 
approach to solving it (Sect. 4). Given a pair of clauses L and M, denoted as 
(L, M), the problem is to decide whether M can be simplified by L via a special 
case of logical consequence. In Fig. 1 we show examples where it is not obvious 
for which pairs (Li, M;i) subsumption resolution can be applied. 


Lı := p(a1, £2) V p(f (x2), £3) L2 := p(ai) V q(z2) 

Mı := p(g(y1), c) V =p(f (c), e) Mə := =p(y) V 79(c) 
Ls := p(z1) V q(£1, £2) V =p(z2) La := p(£1) V q(x2) V r(z3) 
M; := =p(y) V a(y, y) Ma := =p(y1) V q(¢) 


Fig. 1. Illustrative examples. 


In fact, subsumption resolution can only be applied to (Lı, Mı). Later, we 
show how our approach determines that Mı can be shortened in the presence of 
Lı (Example 3.1), but also how the remaining pairs cannot apply subsumption 
resolution (Examples 5.1, 5.2, and 4.1). For example, (L4, M4) is filtered by 
pruning to bypass the SAT routine altogether. 


Our Contributions 


1. We cast the problem of subsumption resolution over pairs of first-order for- 
mulas (L, M) as a SAT problem (Theorem 5.1), ensuring any instance of 
subsumption resolution is a model of this SAT problem. 

2. We tailor encodings of subsumption resolution (Sects. 5.1-5.2) for effective 
SAT-based subsumption resolution (Algorithm 1). 

3. We integrate our approach into the saturation loop, solving for subsumption 
and subsumption resolution simultaneously (Sect. 6). 

4. We implement our work in the theorem prover VAMPIRE and showcase our 
practical gains in first-order proving (Sect. 7). 
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3 Preliminaries 


We assume familiarity with first-order logic with equality. We include standard 
Boolean connectives and quantifiers in the language, and the constants T, 1 for 
truth and falsehood. We use x,y,z for first-order variables, c,d,e for constants, 
f.g for functions, p,q,r for atoms, l,m for literals, and L, M for clauses, all 
potentially with indices. If L is a clause lı V ... V ln, we sometimes consider it 
as a multiset of its literals 1;, and write |L] for its cardinality (i.e. the number n 
of literals in L). The empty clause is denoted O. Free variables are universally 
quantified. An expression F is a term, atom, literal, clause, or formula. 


Substitutions and Matches. A substitution ø is a (partial) mapping from 
variables to terms. The result of applying a substitution o to an expression F is 
denoted o(£) and is the expression obtained by simultaneously replacing each 
variable x in E by o(x). For example, the application of ø := {x + f(c)} to the 
clause L := {p(x), q(x, y)} yields o(L) = {p(f(c)), a(f(c), y)}. Note that o(L) is 
a logical consequence of L. 

A matching substitution, in short a match, between literals l and m is a 
substitution ø such that o(l) = m. For example, the match of p(x) onto p(f(c)) 
is {x + f(c)}. Two matches are compatible and can be combined in the same 
substitution iff they do not assign different terms to the same variable. For 
example, the substitutions {x + f(c), y => g(d)} and {x + f(c),z => h(e)} are 
compatible, but {x > f(c)} and {x g(c)} are not. 


Saturation and Redundancy. Many first-order systems apply the superposi- 
tion calculus [1] in a saturation loop [8]. Given an input set F of clauses, satura- 
tion iteratively derives logical consequences and adds them to F. By soundness 
and completeness of superposition, if is derived the system can report unsat- 
isfiability of F; if O is not encountered and no further clauses can be derived, 
the system reports satisfiability of F. 

Saturation is more efficient when F is as small as possible. For this reason, 
saturation-based provers also employ simplifying inferences. Simplifying infer- 
ences reduce the number or size of clauses in F. This is formalised using the 
following notion of redundancy: a ground clause M is redundant in a set of 
ground clauses F if M is a logical consequence of clauses in F that are strictly 
smaller than M w.r.t. a fixed simplification ordering +. A non-ground clause M 
is redundant in a set of clauses F if each ground instance of M is redundant 
in the set of ground instances of F. If M is redundant in F, then M can be 
removed from F while retaining completeness. 


Subsumption. A clause L subsumes a distinct clause M iff there is a substi- 
tution o such that 
o(L) Cu M (1) 


where Cm denotes multiset inclusion. We also say that M is subsumed by L. 
Note that subsumed clauses are redundant. 
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Removing subsumed clauses M from the search space F is implemented 
through a simplifying rule, checking condition (1) over pairs of clauses (L, M) 
from F. Matches between every literal in L to some literal in M are checked; if 
a compatible set of matches is found, then M can be removed from F. 


Subsumption Resolution. Subsumption resolution aims to remove one redun- 
dant literal from a clause. Clauses M and L are said to be the main and side 
premise of subsumption resolution, respectively, iff there is a substitution ø, a 
set of literals L’ C L and a literal m’ € M such that 


o(L')={>m'} and o(L\L') CM \{m’'}. (2) 
If so, M can be replaced by M \ {m}. Subsumption resolution is hence the rule 


(SR) L M 
M\ {m} 

We indicate the deletion of a clause M by drawing a line through it (M), 
and we refer to the literal m’ of M as the resolution literal of SR. Intuitively, 
subsumption resolution is binary resolution followed by subsumption of one of 
its premises by the conclusion. However, by combining two inferences into one 
it can be treated as a simplifying inference, which is advantageous from the 
perspective of proof search dynamics. 


Example 3.1. Consider Lı, Mı of Fig.1. Subsumption resolution is applied by 
using the substitution o := {x1 > g(y1), £2 œ> c, £3 + e}. Note that o(Lı) = 
p(g(y1),¢) V p(f(c), e). o(Lı) and M; can be resolved to obtain p(g(y1), c). The 
clause p(g(y1), c) subsumes Mı, since it is a sub-multiset of M1. We have 


p(x1, £2) V p(f (£2), £3) plgly c), e) 
P(g(y1), c) 


4 SAT-Based Subsumption Resolution 


We describe the main steps of our SAT-based approach for deciding the appli- 
cability of subsumption resolution on a pair (L, M) of clauses. The core of our 
work solves (2) by finding match substitutions between literals in L and M. Our 
technique is summarised in Algorithm 1. 


Pruning. The first step of Algorithm 1 prunes pairs (L, M) of clauses that 
cannot be simplified by subsumption resolution due to a syntactic restriction 
over symbols in L and M, viz. whether the set of predicates in L is a subset of 
the predicates in M. If not, then there is a literal in L that cannot be matched 
to any literal in M, and hence subsumption resolution cannot be applied. 


Example 4.1. The clause pair (L4, M4) from Fig.1 is pruned by Algorithm 1: 
the set of predicates in L4 and My, are respectively {p,q,r} and {p,q}, implying 
that the literal r(x3) of L4 cannot be matched to any literal in M4. 
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Algorithm 1. SAT-based subsumption resolution over pair (L, M) of clauses 
ms <— createMatchSet() 
solver — createSatSolver(ms) 
procedure SUBSUMPTIONRESOLUTION(L, M) 
if pruned(L, M) then 
return NoSubsumptionResolution 
if fillMatchSet(ms, L, M) is false then 


return NoSubsumptionResolution 


encodeConstraints(solver, ms) 
if solver.solve() is SAT then 
return buildConclusion(solver.getSolution(), M) > conclusion of 
subsumption resolution 


return NoSubsumptionResolution 


Match Set. The match set of Algorithm 1 computes matching substitutions 
over literals of L and M. The match set ms consists of a sparse matrix that 
assigns each literal pair (l;, mj) E€ L x M a substitution c; j such that o;,;(1;) = 
mj or ci j(li) = >m,. In addition, a polarity P; j is also assigned to (l;,m;), as 
follows: we set polarity Pi; = + if ci ;j(li) = mj and Pij = — if o4,;(4) = amj. 
This matrix is sparse because in general not all literal pairs (l;, mj) € L x M 
can be matched. Additionally, it is again possible to prune (L, M) while filling 
the match set: if a row of the match set is empty, then there is some literal in L 
that cannot be matched to any literal in M. In this case, subsumption resolution 
cannot use L to simplify M, so the pair (L, M) is pruned. 


SAT Solver. The solver of Algorithm 1 is the CDCL-based SAT solver intro- 
duced previously [10], which supports reasoning over matching substitutions in 
addition to standard propositional reasoning. This solver also features direct sup- 
port for AtMostOne constraints. Solver performance was tuned for subsumption, 
which we retain for subsumption resolution. Each propositional variable v is asso- 
ciated with a substitution o,, and the solver ensures that all substitutions oy, 
for which v is assigned T in the current model, are compatible. Conceptually, a 
global substitution o satisfying the invariant o = U{o, | v = T} is kept in the 
SAT solver. In the following, we will write this binding as v > ay Coa. 


Example 4.2. Suppose propositional variables vı and v2 are associated with sub- 
stitutions 0, := {x + y} and o2 := {x + z}, respectively. As cı and o2 are 
incompatible, the solver will block assigning v; = T and vg = T simultaneously 
since it would break the above invariant. 


Encoding Constraints. Given the match set of (L, M), we formalise the sub- 
sumption resolution problem (2) as the conjunction of four constraints over 
matching substitutions. Our formalisation is given in Theorem 5.1 and is com- 
plete in the following sense: subsumption resolution can be applied over (L, M) 
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iff each constraint of Theorem 5.1 is satisfiable. Application of subsumption reso- 
lution is tested via satisfiability checking over our constraints from Theorem 5.1. 
Encodings of our subsumption resolution constraints are given in Sect. 5. 


Building the Conclusion. If a model is found for the constraints encoding 
subsumption resolution, the conclusion M \ {m} of SR is built using the model. 


5 Subsumption Resolution and SAT Encodings 


As mentioned in Sect.4, we turn the application of subsumption resolution SR 
over (L, M) into the satisfiability checking problem of Algorithm 1. We give our 
formalisation of SR in Theorem 5.1, followed by two encodings to SAT (Sect. 5.1- 
5.2) and adjustments to subsumption (Sect. 5.3). 


Theorem 5.1 (Subsumption Resolution Constraints). Clauses M and L 
are the main and side premise, respectively, of an instance of the subsumption 
resolution rule SR iff there exists a substitution o that satisfies the following four 
properties: 


existence Ji j. o(l;) = am; (3) 
uniqueness Sf Vij. (a(i) = =m; > j = j’) (4) 
completeness vi. Jj. (o(l;) = =m; V a(l;) = m,;) (5) 
coherence Vj. (Ii. o(1i) = mj = Vi. o(l;) 4 =m;) (6) 


We relate these constraints to the definition of subsumption resolution (2). 
The existence property (3) requires a literal mj in M such that a literal l; of 
L can be matched to =m;, ensuring the existence of the resolution literal in SR. 
Uniqueness (4) asserts that the resolution literal m; of SR is unique, required 
because SR performs only a single resolution step. Completeness (5) requires 
each literal in L be matched either to the complement of a resolution literal, 
or to a literal in M. Since each (complementary) literal in L is matched to one 
(resolution) literal of M, the completeness property ensures that the conclusion 
of SR subsumes M. Finally, coherence (6) states that all literals in M must be 
matched by literals in L with uniform polarity. This implies that all literals of 
L other than the resolution literal are present in the conclusion of SR. We note 
that these constraints can be used to recreate Example 3.1. 


Example 5.1. The clause pair (L2, M2) of Fig. 1 does not satisfy the uniqueness 
property: both the match between p(z) and —p(y) and the match between q(x2) 
and ~q(c) are negative and so no substitution can satisfy all constraints simul- 
taneously. Therefore, subsumption resolution cannot be applied over (L2, M2). 


Example 5.2. The clause pair (£3, M3) violates the coherence property for all 
possible ø, since a negative map from p(x,) to >p(y) cannot coexist with a 
positive map from —p(x2) to ap(y). Subsumption resolution cannot be performed 
over (L3, M3). 
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5.1 Direct SAT Encoding of Subsumption Resolution 


We present our encoding of subsumption resolution constraints as a SAT prob- 
lem, allowing us to use Algorithm 1 for deciding the application of SR. In the 
sequel we consider the clauses L, M as in Theorem 5.1. 


Compatibility. We introduce indexed propositional variables bf, and 6; 
represent o(1;) = mj and o(l;) = =m; respectively, which we use to track com- 
patible matching substitutions between literals of L and M. More precisely, a 
propositional variable is created if and only if the corresponding match is pos- 
sible (i.e., in the formulas below, if no match exist, replace the corresponding 
propositional variable by L). As it is not possible to have simultaneously a sub- 
stitution o;,;(1;) = mj and o;,;(l;) = 7m,, we also write b;,; to mean either 
by, or 6; ; when the polarity of the match is irrelevant. Following Sect.4, the 
variables are bound to their substitutions: 


SAT-based compatibility A A [big > cij Co] (7) 


SR Constraints. Constraints (3)—-(6) of Theorem 5.1 employ bounded quantifi- 
cation over the finite number of literals in L, M. Expanding these quantifiers over 
their respective domains, we translate them into the following SAT formulas: 


SAT-based existence V V bij (8) 

SAT-based uniqueness AN VAN \ bij V by y (9) 
j i StS 

SAT-based completeness \ VV bi j (10) 

SAT-based coherence \ \ N ab}; Vab j (11) 


SR as SAT Problem. Based on the above, application of subsumption resolu- 
tion is decided by the satisfiability of (7)A(8)A(9)A(10)A(11). This SAT formula 
extended with substitutions represents the result of encodeConstraint() in Algo- 
rithm 1 and is used further in Algorithm 3. When this formula is satisfiable, we 
construct the substitution ø required for SR by 


o = Uoi; | bi j = SpA 


From the model of the SAT solver, we extract the first literal b; j assigned T, 


from which we conclude that the j'® literal in M is the resolution literal of SR. 
As such, application of SR over L and M results in replacing M by M \ {m,}. 


Remark 5.1. Implicitly, all l; literals are mapped to at most one literal mj. 
Indeed, if there were several literals m; such that o(1;) = m; or o(l;) = >m,, then 
either the respective matches are not compatible (guarded by the compatibility 
property (7)), there are identical literals in M, or M is a tautology (which is not 
allowed). 
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Remark 5.2. While we defined b; j to be true if, and only if, oij C o, we only 
encode the sufficient condition b; j = gij C o. The completeness property (10) 
together with Remark 5.1 state that each l; must have exactly one match to 
some m; or 7m,. Therefore, if o;,; C o then the respective b;,; must be true and 
the condition also becomes necessary: bi j <= cij C o. 


Example 5.3. Consider the pair (Lı, Mı) of Fig. 1. The match set ms of Algo- 
rithm 1 is: 


_ | {21 g(y1), z2 = c} {a1 > f(c), £2 > e} PS 
Tij = ! L ? E | | | 


Since o2, is incompatible with any substitution, bz; = 1 need not be defined. 
This also allows to disregard SAT clauses that are trivially satisfied. The exis- 
tence (8) and completeness (10) properties cannot have empty clauses: this is 
easily detected while filling the match set, and the instance of SR is pruned. 
Adding falsified literals in these constraints is unnecessary. The uniqueness (9) 
and coherence (11) properties have only negative polarity literals and therefore 
there is no need to add clauses containing 62,1. In light of the previous comment, 
we use variables bt 1» b12 and bz and encode SR using the following constraints: 


bii > {xı > gly1ı), z2 c} Co SAT-based compatibility of bt 
bio > {x1 > f(c), x2 > e} Co SAT-based compatibility of b] > 


bz > {12 > c, 23> e} Co SAT-based compatibility of bz 5 
bia V b32 SAT-based existence 
bii V by 5 SAT-based completeness, i = 1 
bo 2 SAT-based completeness, i = 2 


The uniqueness (9) and coherence (11) properties are trivial here because the 
problem is simple: all bij have the same j, and no literal mj can be mapped 
with different polarities. By using SAT solving from Algorithm 1 over the above 
SAT constraints, we obtain the SAT model by, A aby 5 A^ bz 2, with bz the first 
literal assigned T with negative polarity. The application of SR over (Lı, Mı) 
yields the conclusion M \ {m2} = p(g(y1),c), replacing M. 


5.2 Indirect SAT Encoding of Subsumption Resolution 


SAT-based formulas (9) and (11) may yield many constraints, with worst-case 
complexity O(|L|?|M|?). In practice such situations rarely occur, since the match 
set ms is sparsely populated. Nevertheless, to alleviate this worst-case com- 
plexity, we further constrain the approach of Sect.5.1. We introduce structur- 
ing propositional variables c; such that cj is T iff there exists a literal l; with 
o(l;) = 7m,;, which we encode as: 
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SAT-based structurality VAN e v V a ^ \ VAN (cj V ab; ;) (12) 
j i j i 


SR as revised SAT problem. While the compatibility property (7) remains 
unchanged, the SR constrains of Theorem 5.1 are revised as given below. 


SAT-based revised existence Vo (13) 
J 
SAT-based revised uniqueness AtMostOne({c;,j =1,...,|M|}) (14) 
SAT-based revised completeness N V bij (15) 
i j 
SAT-based revised coherence \ VAN (+0; V ab; ;) (16) 
j i 


Similarly to Sect. 5.1, application of subsumption resolution is decided via Algo- 
rithm 1 by checking satisfiability of (7)A (12) A (13) A (14) A (15) A (16) . Using 
the above SAT formula as the result of encodeConstraint() in Algorithm 1, the 
worst-case behaviour is eliminated in exchange for O(|M]) propositional vari- 
ables, cj. While the direct encoding of Sect. 5.1 is more efficient on small prob- 
lems as it requires fewer variables and constraints, the indirect encoding of this 
section is expected to behave better on larger problems (see Sect. 7). 


Remark 5.3. Note that the uniqueness property (14) is handled via AtMostOne 
constraints, based on the approach of [10]. If a variable cj is set to T, then our 
SAT solver in Algorithm 1 infers that all other variables cj; are set to L. 


Example 5.4. Consider again the clause pair (Lı, Mı) of Fig. 1. Compared to 
Example 5.3, our revised encoding of SR requires one additional variable c2, as 
mg in Example 5.3 is used with negative polarity. The revised constraints are: 


bY, > {1 g(y1),t2 ce} Co SAT-based compatibility of by, 
bio > {z1 > f(c), r2 > e} Co SAT-based compatibility of b; > 
bzo > {12 6,23 e} Co SAT-based compatibility of bz, 
102 V bio V bzo SAT-based structurality of co 
C2 V aby 2 SAT-based structurality of c2 
C2 V =bz, 2 SAT-based structurality of c2 
C2 SAT-based revised existence 
AtMostOne({c2}) SAT-based revised uniqueness 
bta V by 5 SAT-based revised completeness, i = 1 


bo 2 SAT-based revised completeness, i = 2 
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The SAT solver returns bra A aby 9 A by 9 A C2 as a solution to the above SAT 
problem, from which the application of SR yields a similar result to that of 
Example 5.3. 


Remark 5.4. We note that our method naturally supports commutative pred- 
icates, such as equality. Let ~ denote object-level equality. Suppose we have 
literals l; := a ~ b and m; := c ~ d. Two propositional variables with associated 
matching substitutions o;,; and oj; are introduced, where g; j matches a ~ b 
against c ~ d and oj, j matches a ~ b against d œ c. If zero or one matches exist, 
then the problem behaves exactly like the non-symmetric case. If both matches 
exist, then a; j and oj; must be incompatible: otherwise, c and d would be iden- 
tical terms and the trivial literal m; would have been eliminated. Therefore, our 
SAT-based encodings for subsumption resolution do not need to be adapted and 
behave as expected. 


5.3 SAT Constraints for Subsumption 


In the new framework of Algorithm 1, the formulation suggested by [10] was 
adjusted to work with subsumption resolution. Algorithm 1 needs very little 
adaptation for subsumption: the encodeConstraint() method uses the encoding 
below, and the conclusion needs not be built as only the satisfiability of the 
formulas is relevant. The re-written SAT encoding becomes: 


subsumption compatibility \ \ (of, S075 C a) (17) 
aj 
subsumption completeness VAN VV bf, (18) 
i j 
multiplicity conservation \ AtMostOne({b};, i =1,...,|Z|}) (19) 


J 


Note that the set of propositional variables used in our SAT-based formulas 
(17)-(19) encoding subsumption is a subset of the variables used by our SAT- 
based subsumption resolution constraints. 


Pruning for Subsumption. The pruning technique described in Sect. 4 can 
be adapted into a stronger form for subsumption. In this case, we will check for 
multi-set inclusion between multi-sets of (predicates, polarity) pairs. 


6 SAT-Based Subsumption Resolution in Saturation 


In this section we discuss the integration of our SAT-based subsumption resolu- 
tion approach within saturation-based proof search. 


Forward/Backward Simplifications. For the purpose of efficient reasoning, 
saturation algorithms use two main variants of simplification inferences imple- 
menting redundancy. Forward simplifications are applied on a newly generated 
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Algorithm 2. SAT-based subsumption in saturation 
ms — createMatchSet() 
solver — createSatSolver(ms) 
procedure SUBSUMPTION(L, M) 
Fs, Fsr — pruned(L, M) 
> Fs (resp. Fsr) gets true if subsumption (resp. subsumption resolution) cannot 
succeed 
fillMatchSet(ms, L, M) > Build the whole match set, and update Fs and Fsr 
if Fs then > subsumption cannot be applied 
return NoSubsumption 


encodeConstraints(solver, ms) > SAT-constraints of Section 5.3 
if solver.solve() is SAT then 

return Subsumed 
else 

return NoSubsumption 


Algorithm 3. SAT-based subsumption resolution in saturation 
- with subsumption already set up via Algorithm 2 
procedure SUBSUMPTIONRESOLUTION(L, M) 
> upon Algorithm 2 failing to subsume 
> the match set is already set up 


if Fp then 

return NoSubsumptionResolution 
encodeConstraints(solver, ms) > SAT constraints of Sect. 5.1 or Sect. 5.2 
if solver.solve() is SAT then 

return buildConclusion(solver.getSolution(), M) > conclusion of 


subsumption resolution 


return NoSubsumptionResolution 


clause M to check whether M can be simplified by an existing clause L. Backward 
simplifications use a newly generated clause L to check whether L can simplify 
existing clauses M. Backward simplification tends to be more expensive. 


SAT-Based Subsumption Resolution in Saturation. Since subsumption 
is a stronger form of simplification, subsumption is checked before subsumption 
resolution. This means that subsumption resolution is applied only if subsump- 
tion fails for all candidate premises. We integrate Algorithm 1 within saturation 
so that it is used both for subsumption and subsumption resolution. 

Algorithms 2-3 display a variation of the integration of our SAT-based app- 
roach for checking subsumption resolution during saturation. Since most of the 
setup of subsumption is also required for subsumption resolution, both simplifica- 
tion rules are set up at the same time. As such, whenever turning to subsumption 
resolution, the same match set ms from Algorithm 2 can be reused, while also 
taking advantage of pruning steps performed during subsumption. 

We modified the forward simplification algorithm as described in Algo- 
rithm 4. In this new setting, checking the same pair (L, M) for subsumption 
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Algorithm 4. Forward simplification with SAT-based subsumption resolution 


procedure FORWARDSIMPLIFY(M, F’) 
M* — NoSubsumptionResolution 


for L € F\ {M} do 


if subsumption(L, M) is Subsumed then > using Algorithm 2 
return T > M is subsumed and removed 
if M* = NoSubsumptionResolution then 
M* — subsumptionResolution(L, M) > using Algorithm 3 


if M* + NoSubsumptionResolution then 
Fo F\{M} U {M*} >œ M* is the conclusion of subsumption resolution 
between L and M 
return T 
return L 


Algorithm 5. Evaluation of SAT-based subsumption resolution 
procedure FORWARDSIMPLIFY WRAPPER(M, F) 
s — startTimer() 
r — ForwardSimplify(M, F) > Benchmarked method 
> Prevent modification of F 


e — endTimer() 

writeInFile(e — s) 

r’ — Oracle(M, F) 

checkCoherence(r, r’) > Empiric check 
return 1’ 


directly followed by subsumption resolution enables us to use Algorithms 2 and 
3 efficiently. Algorithm 4 pays the price of checking subsumption resolution even 
if subsumption may succeed, but in practice inefficiencies in this respect are seen 
rarely. 


Role of Indices. When applying inferences that require terms or literals to 
unify or match, modern automated first-order theorem provers typically use 
term indices [9] to consider only viable candidates within the set of clauses. 
Subsumption and subsumption resolution is no exception. Our testbed system 
VAMPIRE currently uses a substitution tree to index clauses for matching by 
their literals (Sect. 7). 


7 Implementation and Experiments 


We implemented and integrated our SAT-based subsumption resolution app- 
roach in the saturation-based first-order theorem prover VAMPIRE [6]!. 


1 https: //github.com/vprover /vampire/tree/robin_c-subsumption_resolution. 
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Versions compared. We use following versions of VAMPIRE in our evaluation: 


e VAMPIRE is the master branch without SAT-based subsumption resolution; 

e VAMPIRE; is the SAT-based subsumption resolution with the indirect encod- 
ing of Sect.5.2 and a standard forward simplification algorithm with Algo- 
rithm 1 — that is, Algorithm 4 is not used here; 

e VAMPIRE} uses the indirect encoding with Algorithms 2—4; 

e VAMPIRE) uses the direct encoding of Sect.5.1 and Algorithms 2-4. 


Experimental Setting. To evaluate our work, we used the examples of the 
TPTP library (version 8.1.2) [15]. In our evaluation, 24926 problems were used 
out of the 25257 TPTP problems; the remaining problems are not supported by 
VAMPIRE (e.g., problems with both higher-order operators and polymorphism). 

Our experimental evaluation was done on a machine with two 32-core AMD 
Epyc 7502 CPUs clocked at 2.5 GHz and 1006 GiB of RAM (split into 8 memory 
nodes of 126 GiB shared by 8 cores). Each benchmark problem was run with 
the options -sa otter -t 60, meaning that we used the OTTER saturation 
algorithm [7] with a 60-second time-out. We use the OTTER strategy because 
it is the most aggressive in terms of simplification and therefore runs the most 
subsumption resolutions. We turned off the AVATAR framework (-av off) in 
order to have full control over SAT-based reasoning in VAMPIRE. 


Evaluation Setup. Our evaluation process is summarised in Algorithm 5, 
incorporating the following notes. 


e The conclusion clause of the subsumption resolution rule SR is not necessarily 
unique. Therefore, different versions of subsumption resolution, including our 
work based on direct and indirect SAT encodings, may not return the same 
conclusion clause of SR. Hence, applying different versions of subsumption 
resolution over the same clauses may change the saturation process. 

e Saturation with our SAT-based subsumption resolution takes advantage of 
subsumption checking (see Algorithms 3 and 4). Therefore, only checking 
subsumption resolution on pairs of clauses is not a fair nor viable comparison, 
as isolating subsumption checks from subsumption resolution is not what we 
aimed for (due to efficiency). 

e CPU cache influences results. For example, two consecutive runs of Algo- 
rithm 4 may be up to 25% faster on second execution, due to cache effects. 


For the reasons above, we decided to measure the run time of a complete 
execution of Algorithm 4. To prevent the branches to change, an Oracle is used 
to choose the path to follow. The Oracle is based on our indirect SAT encoding 
(VAMPIRE7). This way, the same computation graph is used for all evaluated 
methods. To prevent cache preheating, we run the Oracle after the respective 
evaluated method. This way the cache is in a normal state for the evaluated 
method. To measure the run time of Algorithm 4, a Wrapper method was built on 
top of the Forward Simplify procedure of Algorithm 4. This Wrapper replaces 
the Forward Simplify loop in VAMPIRE with minimal changes to the code. To 
empirically verify the correctness of our results, we used the Wrapper to compare 
the result of the evaluated method with the result of the Oracle. 
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Experimental Details and Analysis. Fig.2 lists the cumulative instances 
solved by the respective VAMPIRE versions, highlighting the strength of forward 
simplifications for effective saturation. 


—e— Vampire_M 
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—— Vampire*_| 
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Fig. 2. Cumulative instances of applying subsumption resolution, using the TPTP 
examples. A point (n, t) on the graph means that n forward simplify loops were executed 
in less than t us. The flatter the curve, the faster the VAMPIRE version is. 


Table 1. Average time spent in the Forward Simplify loop. VAMPIRE} is the fastest 
method, closely followed by the VAMPIRE7. However, the indirect encoding is much 
more stable and has a lower variance. 


Prover Average Std. Dev. | Speedup 
VAMPIRE 42.63 us | 1609.06 us 0% 
VAMPIRE; 40.13 us | 1554.52 us | 6.2% 
VAMPIRED | 34.39 us | 1047.85 us | 23.9% 
VAMPIRE] | 34.5518 250.25 us | 23.4% 


Remark 7.1. Our experimental summary in Fig. 2 shows that the total number 
of Forward Simplify loops ran in 60s. However, the average and standard 
deviation were computed only on the intersection of the problems solved. That 
is, only the Forward Simplify loops finished by all the methods are taken into 
account. Otherwise, if a hard problem is solved in, for instance, 1000000 us by 
one method, and times out for another, the average for the better would increase 
a lot, but the weaker method would not be penalised. Table 1 summarises the 
average solving time of our evaluation. 


Comparison of Encodings. We correlated the constraint building and SAT 
solving time with the length of clauses, using the different encodings of Sects. 5.1— 
5.2. Figure 3 shows that on larger clauses, the average computation time increases 
faster for the direct encoding than for the indirect encoding. 
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(b) Average time (us) for creating/solving indirect encoding constraints (Section 5.2). 


Fig. 3. Average time (us) spent on the creating and solving SAT-based subsumption 


resolution constraints. 


Table 2. Number of TPTP problems solved by the considered versions of VAMPIRE. 
The run was made using the options -sa otter -av off with a timeout of 60s. The 
Gain/Loss column reports the difference of solved instances compared to VAMPIRE M. 


Prover Total Solved Gain/Loss 
VAMPIREy | 10555 baseline 
VAMPIRED | 10667 (+141, —29) 
VAMPIRE? | 10658 (+133, —30) 


Experimental Summary. Our experiments show that VAMPIRE} yields the 
most stable approach for SAT-based subsumption resolution (Table 1), especially 
when it comes on solving large instances (Fig. 3). Our results demonstrate the 
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superiority of SAT-based subsumption resolution used with forward simplifica- 
tions in saturation (e.g., VAMPIRE}, and VAMPIRE7), as concluded by Table 2. 


8 Conclusion 


We advocate SAT solving for improving saturation-based first-order theorem 
proving. We encode powerful simplification rules, in particular subsumption res- 
olution, as SAT problems, triggering eager and efficient reasoning steps for the 
purpose of keeping proof search small. Our experiments with VAMPIRE showcase 
the benefit of SAT-based subsumption. In the future, we aim to further extend 
simplification rules with SAT solving, in particular focusing on subsumption 
demodulation for equality reasoning [3]. 
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Abstract. IsaSAT is the most advanced verified SAT solver, but it did 
not yet feature inprocessing (to simplify and strengthen clauses). In order 
to improve performance, we enriched the base calculus to not only do 
CDCL but also inprocess clauses. We also replaced the target of our code 
synthesis by Isabelle/LLVM. With these improvements, we can solve 4 
times more SAT Competition 2022 problems than the original IsaSAT 
version, and 4.5 times more problems than any other verified SAT solver 
we are aware of. Additionally, our changes significantly reduce the trusted 
code base of our verification. 


1 Introduction 


SAT solving is a very important tool that has been extensively used in various 
applications like mathematics or cryptography. To ensure the correctness of the 
answer provided by a SAT solver, there are two approaches: either producing a 
certificate that can be checked independently or verifying a SAT solver. The first 
approach has been extensively studied and works very well in practice [19,26,28] 
— only checked proofs are counted in the SAT Competition [2]. 

The second approach, i.e., verifying a whole SAT solver is orders of mag- 
nitudes more complex than checking a certificate. To this end, the goal of the 
IsaFoL (Isabelle Formalization of Logic) [3] effort is to develop methodology and 
libraries for formalizing modern research in automated reasoning. In this con- 
text, we have verified a CDCL calculus (conflict-driven clause learning) and a 
two-watched literals data structure (Sect. 2). To show that they are useful, we 
have developed the verified SAT solver IsaSAT [8], which we later optimized [12]. 
To our surprise, it won the EDA Challenge 2021 defeating all the non-verified 
solvers, but, as expected, it finished last at the SAT Competition 2022 [2]. How- 
ever, the former used a much shorter timeout (200s, not announced before the 
competition) whereas the latter uses 5000s. 

In this paper, we present our new developments in IsaSAT, which make 
our solver arguably the most advanced formally verified SAT solver to date: 
inprocessing and verifying fast LLVM code [20] rather than slow functional code. 


© The Author(s) 2023 
B. Pientka and C. Tinelli (Eds.): CADE 2023, LNAI 14132, pp. 207-219, 2023. 
https://doi.org/10.1007/978-3-031-38499-8_12 
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Inprocessing is a critical feature of modern SAT solvers (e.g., every winner of 
the SAT Competition since 2013 includes it). In order to use it in our formally 
verified solver, we had to extend our verified CDCL calculus: Our new PCDCL 
calculus includes features to encompass various inprocessing techniques, even if 
we have not yet implemented every possible technique (Sect. 3). 

We generate IsaSAT by exporting a model in the interactive theorem prover 
Isabelle [22] to executable code. Earlier we used Isabelle’s default code gen- 
erator to export to Standard ML (SML). However, the performance was not 
sufficient — especially memory consumption was very high. Thus, we switched to 
Isabelle/LLVM [18], which generates LLVM intermediate representation (LLVM 
IR). Apart from allowing faster imperative code, it also reduced the trusted 
code base (Sect. 4), replacing the rather niche MLton [27] compiler by only the 
backend of the widely used LLVM. 

Porting our entire development to Isabelle/LLVM required some changes 
and some cleanup. Moreover, when we implemented and verified inprocessing, 
we realized that some design decisions need to be improved. In Sect. 5, we report 
on our experiences and lessons learned while porting and extending IsaSAT. 

Finally, we have benchmarked IsaSAT on the problems from the SAT Compe- 
tition 2022. We show that just porting IsaSAT from SML to Isabelle/LLVM sig- 
nificantly improved the performance, and the new inprocessing techniques com- 
bined with heuristic improvements give us another significant increase, demon- 
strating the usefulness of our PCDCL calculus (Sect. 6). 

This presentation is an extended version of our (non-peer-reviewed) system 
description from the EDA Challenge 2021 [13] and the SAT Competition 2022 [6]. 
Compared to that version, we have provided much more details on PCDCL, our 
experience porting the development to LLVM, and performance tests. 


2 Preliminaries 


CDCL. CDCL is a procedure that builds a partial assignment called a trail 
either by guessing (called deciding) or propagating information. If the partial 
assignment is a model, the algorithm stops. If there is a conflict between the 
partial assignment and a clause, the partial assignment is repaired and a new 
clause is learned. For more details (beyond the scope of this paper), we refer the 
reader to the Handbook of Satisfiability [7]. 

We use a transition system for our formalization of CDCL [8]. Its state con- 
sists of the trail M, the (multi)sets of initial and learned clauses (N and U), 
and the conflict clause to analyze (or None if there is none). We show one rule, 
decide, that adds L to the current assignment M: 


inductive decide :: ‘st > ’st > bool where 
undefined_lit M L => |L| € |N| = 
decide (M, N, U, None) (L - M, N,U, None) 


If no conflict has been found so far (None), we add the new literal L at the begin- 
ning of the trail M. We prove that our set of rules is terminating and correct [8]. 
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Code Synthesis. To generate the IsaSAT code, we start from the abstract rules 
like decide and gradually refine it to some deterministic functions using the 
Refinement Framework [16]. Then, we rely on Sepref [17] to synthesize code: It 
takes an (Isabelle) function and synthesizes a new version, replacing functional 
data structures (like lists) by imperative data structures (like arrays). There 
are two versions of the tool. The older version, which we used before [8,12], 
uses Imperative HOL [9] and Isabelle’s standard (trusted) code generator [14] 
to export code into various functional languages. We used Standard ML (SML) 
with the compiler MLton [27], because it offers (by far) the best performance 
for our use case. The new Sepref is part of the Isabelle/LLVM library (devel- 
oped by the second author) and generates LLVM IR from a model of LLVM IR 
inside the theorem prover. The code generator interprets a shallow embedding 
of Isabelle/LLVM as equivalent to similar looking LLVM code. This reduces the 
trusted code base in two ways: first, the trusted pretty printer is simpler, and, 
second, instead of the rather niche full compiler MLton, we use only the backend 
of the widely used LLVM [20]. 

The biggest difference is that Imperative HOL allows arbitrary large arrays 
and integers, whereas Isabelle/LLVM is more realistic, requiring integers (in 
particular array offsets, see Sect.5.1) to have a fixed bit-width. 


Related Work. Our goal is to produce a fully verified SAT solver, without any 
runtime checks, that both terminates and returns a correct model while using 
efficient data structures. No other solver achieves all three goals. The SAT solver 
TRUESAT from Andrici and Ciobaca [1] relies on the original DPLL and uses 
less efficient data structures (including counters instead of watch lists), but it 
terminates. Historically, this would roughly correspond to SAT solver from the 
early 90s. However, it only uses stateless heuristics, and it is not clear if the 
approach can be extended to CDCL (where the solver learns and keeps new 
clauses) or to stateful heuristics (like VSIDS [21]). The solvers VERSAT [23] and 
Creusat [25] go into a similar direction with CDCL instead of DPLL, but prove 
a weaker correctness property: they only show that an UNSAT result is correct, 
while a SAT result requires an additional check. Also, termination is not proved. 
Only proving this partial property makes many proofs considerably easier, in 
particular adding restarts. Oe et al’s solver VERSAT [23] was the first partially 
verified solver that could run benchmarks from the SAT Competition. More 
recently, Skotam [25] has verified in his Master’s thesis the SAT solver CreuSAT 
using the Creusot framework (relying on Why3 internally). While CreuSAT is 
much faster than VERSAT in our tests, its correctness relies on (trusted) SMT 
solvers, and the proofs are not checked by a small kernel like our Isabelle code. 
However, the verification also takes much less time (a few minutes compared to 
several hours). 

Modern SAT solvers use inprocessing to make the subsequent CDCL run 
heuristically faster [15]. In particular, clauses are strengthened and global trans- 
formation (e.g., to remove variables) are applied. Two techniques (that we do 
not support), variable elimination and addition, slowly change the models of the 
formula by changing the set of variable. The SAT solver then reconstructs a 
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model of the original formula at the end. Fazekas et al. [11] made it compati- 
ble with incremental SAT solving. All others inprocessing technique fit into our 
extended CDCL described in the next section. 


3 Pragmatic CDCL for Inprocessing 


SAT solvers nowadays apply a combination of CDCL (most of the time) and 
inprocessing (sometimes). Therefore, we extended our calculus similarly. At the 
core, we have our terminating CDCL. We also allow for formula transformation 
and restarts. We call the combination pragmatic CDCL or PCDCL. 


Splitting the Clause Set. Inprocessing makes it possible to strengthen and 
simplify clauses. However, we want models from the final set of clauses to remain 
models from the initial set of clauses. Deleting clauses is not possible: if we start 
with the clauses AVC and BV-B, removing the tautology means that the model 
A of AVC is not a model of the initial clause set anymore. Hence we want to 
keep the literal B without considering the tautology for propagation/conflict. 

To solve the issue we split our set of clauses into two parts: clauses that 
are useful for propagation and clauses that can be ignored but are kept for 
their literals. Thus we keep the set of all literals A constant. For our proof of 
refinement to the original CDCL, we have to make sure that the new behavior is 
also possible in the original calculus — in particular we do not miss propagations 
or conflicts. In the case of tautologies, this is simple (they are never used). If 
we consider subsumption, like A V B subsumes A V B V C, whenever the latter 
propagates, then the former is a conflict. Therefore, the behavior is compatible. 

While the idea of splitting our clauses seems surprising, the additional clause 
sets are only required for the connection to our CDCL transition system, and 
we entirely remove them when generating the code. Moreover, the refinement 
is easier as we do not have to update our heuristics to remove literals (and 
potentially shorten arrays). Finally, this is similar to the behavior of SAT solvers 
like Kissat [4]: while the clauses are removed, all literals of the problem are set. 

In our original refinement, we have split the clauses to distinguish between 
clauses of length 1 (where we cannot distinguish two distinct literals and thus 
they cannot fit into our two-watched literals data structures) and longer clauses, 
but the aim was only distinguishing on the length. 

One important point to notice is that the role of our clause sets changes. In 
our original CDCL, N was the (immutable) set of initial clauses and U contains 
the redundant clauses that can be removed at any point: N ensures that we do 
gain new models during our transformations. Now, the set changes: strengthening 
an irredundant clause from N also shortens the clause that is in there. Therefore, 
a naive version could remove literals. 

Overall we have 4 sets of clauses: the irredundant clauses N and the redun- 
dant U clauses, and each one is divided into the active clauses (Na and Ua) and 
the inactive (discarded) clauses (Na and U4). For example, tautologies or sub- 
sumed clauses are discarded, but remain in N, so literals are never removed. In 
our development there are actually three sets (containing a literal set at level 0 
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or tautologies, subsumed clauses, and false clauses) to reduce the number of case 
distinction in some proofs. We never demote irredundant clauses to redundant 
ones, but we can promote them. 


Inprocessing Rules. Our aim when picking the rule is to be general (like 
we can learn any useful clause) and then we specialize rules to specific tech- 
niques. We will show this with the example of subsumption-resolution [7]. When 
doing subsumption-resolution, we resolve two clauses together if the conclusion 
is shorter. Then we can remove either one or both of the antecedents. For exam- 
ple, resolving AV B V C with AV ~C produces the clause A V B with subsumes 
the former clause. If the latter clause was AV BV —=C, the resolved clauses would 
actually subsume both clauses. 

One of the most important inprocessing rule learns any possible clause. To 
simplify the presentation, we will only give the rules operating on the learned 
clauses, but similar rules exists for the initial set of clauses. 


inductive cdcl_learn_clause :: ‘prag_st = 'prag_st = bool where 
|C| C |N + Nal = count_decided M = 0 => 
N ^A Na EF C => ntautology C => distinct C => 
cdcl_learn_clause (M, N, U, None, Na, Ua) 
(M, N,U ^C, None, Na, Ua) 


The side conditions not only include that the clause is entailed and duplicate- 
free, but also the clause is not a tautology and we do not break CDCL invariants 
(count_decided M = 0). Then we can deactivate subsumed clauses: 


inductive cdcl_subsumed :: ‘prag_st = 'prag_st = bool where 
C C D = count_decided M = 0 = 
cdcl_subsumed (M, N,U A C A D, None, Na, Ua) 
(M,N,U ^C, None, Na, D A Ua) 


We combine these rules to express subsumption-resolution: We first learn the 
clause obtained by resolution. Then we can remove the antecedents. If either 
antecedent is in N, we also have promoted the conclusion from N to U. The 
advantage of our approach is that we can express other inprocessing techniques 
without adding new rules, only by specializing them. 

Overall we have 9 rules with some overlap with CDCL (propagation and 
conflict), but mostly simplification of clauses (removing true clauses and false 
literals from clauses) and pure literal deletion: When a literal always appears 
positively (or always negatively), we can set this literal to be true unconditionally 
(later removing all clauses containing it): every model after adding the clause is 
also a model of the original set of clauses but not the opposite. This is the first 
transformation that does not preserve models in IsaSAT or any other verified 
SAT solvers. 


Refinement of Subsumption-Resolution. While the definition of subsump- 
tion resolution is very simple, the refinement to code was challenging. 


212 M. Fleury and P. Lammich 


We verified forward subsumption [7] following CaDiCaL [5] (unbounded how- 
ever, so all clauses selected heuristically are checked). We sort clauses by size 
and check if the current candidate is subsumed by one of the smaller clauses. 
Because we use two-watched literals, we need to distinguish between the binary 
clauses (than can produce new units) and the other clauses. At the end, we 
implemented two forward subsumption passes: one for binary clauses only and 
the other for larger clauses. 

To subsume the candidates, we build occurrence lists and populate them with 
binary clauses, whereas Kissat [5] reuses watch lists. Moreover, for efficiency, 
we need a new marking data structure for efficient detection of subsuming- 
resolution. 


4 Correctness of the Code and Completeness 


Our specification model_if_satisfiable takes the multiset of clauses and returns 
a model (if there is one) or None if the clauses are unsatisfiable. Our imple- 
mentation IsaSATsmi opts takes an array containing the clauses and returns an 
optional array containing the assignment, assuming that the clauses do not con- 
tain duplicated literals or the empty clause (precondition proper_lits_no_dups__). 
The additional argument opts activates and deactivates certain techniques for 
solving. The following theorem states that our implementation refines the spec- 
ification: 


Theorem 1 (SML End-to-End Correctness). The following refinement 
relation holds: 


(IsaSATsu_ opts, model_if_satisfiable) 
€ [proper_lits_no_dups_1] clauses_assn — option_model_assn 


The LLVM version is nearly the same. It can handle duplicated literals and 
the empty clause. Moreover, the new specification model_if_satisfiable_bounded 
allows for an unknown result if arrays would grow larger than the size permitted 
by the fixed bit-width. While this limit does not exist in Imperative HOL, it 
exists in practice as no machine supports arrays that large. Therefore, we tech- 
nically weakened our theorem, but did not change practical guarantees on the 
generated code. For IsaSATsmi_ we start [12] with 64-bit unsigned integers and 
only switch to GMP integers if the arrays grow too large. 


Theorem 2 (LLVM End-to-End Correctness). The following refinement 
relation holds: 


(IsaSATLivm opts, RETURN o model_if_satisfiable_bounded) 
€ [proper_lits] clauses_assn — option_model_assn 


Moreover, the change from SML to LLVM reduces the trusted code base: 
The Isabelle/LLVM model is closer to the actual LLVM, such that the trusted 
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pretty-printer is simpler. LLVM is also more low-level, such that fewer parts of 
the compiler have to be trusted. Finally, the LLVM compiler is more widely used 
and tested than the rather niche MLton compiler we used before. 


5 Experience Porting the Development to LLVM 


We report on the challenges we faced when updating the huge IsaSAT formaliza- 
tion (Sect. 5.1). Moreover, we report on the unverified parts of IsaSAT (Sect. 5.2), 
and finally compile some lessons learned (Sect. 5.3). 


5.1 Required Changes 


Before porting the development to LLVM, we removed our only remaining source 
of unbounded integers: the clause indices during the garbage collection. As 
garbage collection does not happen very often, we did not expect this to make 
a difference. Surprisingly, it turns out to have a performance impact. 

Isabelle/LLVM is an entire tool set, including a fork of the original Sepref 
tool. While related to the original Sepref tool, there are different libraries, and 
the development of the two versions has diverged. 

Initially, we tried to support both versions of Sepref. We ended up with two 
sets of files for code synthesis, and duplication of some libraries (to provide 
constants defined in Isabelle/LLVM but not in SeprefsmL). This significantly 
complicated our refinement approach, although we made it conceptually cleaner 
during the porting. Then, we realized that IsaSATLtvm was much faster than 
IsaSAT smi (we observed a factor 2 on our test files), and decided to discontinue 
the SML backend. 

With this, also some workarounds for SML specific performance issues (like 
the tuple uint32 * bool * uint64 being much less efficient than combining 
the uint32 and the Boolean into a single 64-bit number) became obsolete. 


Compilation. We have experimented with compilation flags before to improve 
performance. We know from the SML code that we need to increase the level of 
inlining, because many small functions make the verification easier. The same 
applies for LLVM and the easiest solution is to use link-time optimization that 
increases the inlining level as a side effect. However, this makes profiling impos- 
sible — exactly like the SML code. So there is no regression here. 


Tuples. In 2021, we observed a major performance regression of the synthe- 
sis, caused by a new feature in Sepreftrıvm: pointer-equality tracking caused 
quadratic behaviour for case-splits of tuples. As our solver state is a large tuple, 
synthesis became impossible (several dozen minutes for simple functions). 

To avoid the issue, we decided to work around on the abstract level, using 
getter and setter functions for the state’s components, rather than case splitting. 
Now, every function on the state would first get the required components, update 
them, and then put them back. For example: 
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definition rescore_conflict :: clause_inderz = isasat = isasat where 
rescore_conflict C S = do{ 
let (M, S) = extract_trail S; 
.. (“reads the trail M and can change it*) ... 
let S = update_trail M $; 
RETURN S$ 
} 


This makes synthesis much faster. However, the ownership model of Sepref does 
not allow aliasing, nor do our refinement relations allow leaving a ’gap’ in the 
state where we moved out an element. As an easy work-around, we resorted to 
placing dummy-values, like empty lists, in the state, hoping that LLVM would 
optimize away the allocations and deallocations for these values. However, this 
did not happen: In the hot-spot of the SAT solver, the propagation loop, the 
dummy value for the trail was recreated and freed each time. Thus, we locally 
resorted to unfolding our code to make sure that we need only one free in the 
inner propagation loop. We leave a more principled solution of this problem 
(possibly changing Sepref) to future work. 

We even attempted to go one step further (as the state-of-the-art SAT solver 
Kissat [4] does) and simply passing a pointer to the state structure as argument. 
Once we had already changed our refinement with accessors, we simply had 
to change them to work on a pointer. However, we never managed to make 
the synthesized code efficient. We observed a factor of 10 slower code. Hand- 
optimizing the accessors (basically making sure that LLVM understands that 
we care only about one component) reduced this to factor 2 slower. Once we 
realized that the LLVM optimizer was replacing the pointer by the structure 
passed directly as argument, we gave up on that approach. 


5.2 Unverified Parts 


In the generated SAT solver, there are some parts that we cannot verify. First, 
the parser is not verified, because the file system has no model in Isabelle (unlike 
CakeML, where conditions apply however). To this end, we link the verified code 
with an unverified C program, which provides the parser and command line 
interface. 

Second, Isabelle/LLVM does not support any output (like statistics, or the 
DRAT proofs [28] required for the SAT Competition). For the SML version, we 
could use a feature of Isabelle’s code generator to (axiomatically) implement 
a function by some external function (e.g. a function that does nothing in the 
model, by a printing function). As Isabelle/LLVM does not yet have such a 
feature, we resorted to post-processing the generated code (i.e., a function that 
does nothing in the model, is replaced by a printing function or even a function 
storing some literals for DRAT proofs). Note that this post-processing is not 
required for IsaSAT to work (but it won’t print DRAT proofs). 
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5.3 Lessons Learned 


Lesson 1: Embrace Duplication. We have already highlighted the importance 
of the set of all possible literals A, in particular to establish a bound on the 
size of various arrays. At first, we tried to avoid duplicating this set across the 
different components on the specification side. This, however, resulted in a closer 
coupling of the various refinement proofs, impeding modularity: data structures 
that, conceptually, are just a small part of the whole state, have to be formalized 
on the whole state, just to have the set A available. We solved this problem by 
duplicating the set A on the abstract level for all new data structures. Note that 
this duplication is removed in a later refinement stage. 


Lesson 2: The Limits are Isabelle Files. Checking our Isabelle files takes 
nearly two hours. This can be explained by three factors: 1. the heuristic and 
code synthesis amounts to 91 000 loc, making it a very large formalization; 2. the 
synthesis is single-threaded (for technical reasons); 3. Sepref encourages a style 
that is not very parallel: every refinement starts with a call to a tactic refine_vcg 
that generates the goals (meaning that all successive tactics have to wait). To 
improve performance we have attempted [12] to generate more standard proofs 
in Isar (by generating the text corresponding to the theorems to prove), but it 
is not clear that this style is faster as huge number of variables are generated 
(this style is required for more complicated proofs, however). 

In order to improve Isabelle’s performance and speed-up the testing of new 
heuristics in IsaSATiLvm, we have split the files into three parts: the shared def- 
initions of the functions to refine, the (single-threaded) synthesis, and the cor- 
rectness proof of the refinement. Even with these optimizations, proof checking 
still takes 2h. There is also no clear improvement path. The old SML version has 
a similar problem, but it is overall faster because it has fewer features, making 
it less critical. 


Lesson 3: Performance Bugs exist. In order to improve performance, we 
need to measure and observe performance. To solve that problem, IsaSAT prints 
statistics and produces some timing information. The statistics during the run 
made identifying scheduling bugs for the different techniques possible — we acci- 
dentally ran some techniques way too often or barely ever. Especially because 
we increase the interval between two inprocessing rounds geometrically, a simple 
statistics at the end of the run is not sufficient. One interesting performance 
bug we found was that we accidentally inverted reducing clauses (marking them 
as removed) and garbage collection (physically removing them). Therefore, we 
would nearly always physically delete clauses. We never saw this issue, because 
we also printed the statistics inverted. To help debugging performance, we pro- 
duce some timing information by measuring time in the C program: 


c propagate : 83.48% (581.66 s) 
c reduce : 0.12% (0.82 s) 
c subsumption : 0.06% (0.39 s) 
c pure_lits : 0.05% (0.33 s) 
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Fig. 1. CDF of the performance of SAT solvers 


c binary_simp : 0.02% (0.15 s) 
c GC : 0.16% (1.10 s) 


This helps to identify bottlenecks but also outliers where one technique is par- 
ticularly slow and requires some limits or a change in the scheduling to avoid 
slowing down the solver too much. This makes it possible to identify errors like 
allocations in loops. The overall timing matches what we expect from other 
SAT solvers (although usually they spend more time on inprocessing and less on 
propagation). 


6 Performance 


In order to study the performance we have run 3 different IsaSAT versions: 
the original SML solver (using MLton with the LLVM backend), the first port 
of the IsaSAT solver, and the current version with inprocessing and various 
other improvements on heuristics that do not require any change on our PCDCL 
calculus, notably rephasing and target phases [10] (but no local search) and the 
alternation between aggressive restarts (heuristically seems better for UNSAT) 
and few restarts (seems better for SAT) following the ideas of Chanseok Oh [24]. 

We run all the benchmarks from the SAT Competition 2022 on an Intel Xeon 
E5-2620 v4 CPU at 2.10GHz (with turbo-mode disabled) with a memory limit 
of 7GB and a timeout of 5000s. For comparison, we have included VERSAT [23] 
and CreuSAT [25]. For completeness, we have included Kissat [6] (more precisely 
the bulky version submitted for the anniversary track). 

The results are given in Fig. 1 as a CDF (the higher the curve, the more solved 
problems). The first surprise is that CreuSAT performs similarly to IsaSAT sm. 
(37 vs 40 solved problems), worse than expected given the results reported in the 
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Master’s thesis [25] that tested on the 2015 benchmarks. We suspect that is due 
to the garbage collection and the fact that problems from the SAT Competition 
have become harder. 

There is a clear improvement when going from the SML version to the LLVM 
version (98 solved), while the latest version solves 166. The SML version produces 
335 out-of-memory errors (OOMs), the base LLVM version is more memory 
efficient (23 OOMs) like the latest IsaSAT version (19 OOMs) or CaDiCaL that 
has the same memory layout (17 OOMs). However, there is still a large gap to 
reach the performance level of Kissat and its inprocessing techniques. 


7 Conclusion 


We have reported on updating our verified SAT solver IsaSAT to a more powerful 
base calculus (our pragmatic CDCL) which can express inprocessing, and to the 
more efficient Isabelle/LLVM backend. We have also compiled important lessons 
learned from proof-engineering and maintaining large formalizations like IsaS AT 
(~200 kloc of proofs). 

Our changes made IsaSAT solve 4 times more problems (166/40), making it 
the most efficient verified SAT solver. At the same time, our verification is more 
complete than the next fastest verified solvers. 

Most techniques (including the two most important, vivification and probing) 
either fit into our new PCDCL base calculus or do not require any change (like 
random walk [10] that is conjectured to be the reason for the major performance 
improvement in 2020). One major technique that we cannot currently express is 
variable elimination, because models are changed and need to be fixed. We leave 
the required extensions to our PCDCL for future work. 


Acknowledgments. The work presented here was done over several years and several 
work places. The first author was supported for some time by Austrian Science Fund 
(FWF), NFN S11408-N23 (RiSE), and the LIT AI Lab funded by the State of Upper 
Austria. We thank the anonymous reviewers for their detailed comments. 


References 


1. Andrici, C.C., Ciobaca, S.: A verified implementation of the DPLL algorithm in 
Dafny. Mathematics 10(13), 1-26 (2022). https://ideas.repec.org/a/gam/jmathe/ 
v10y2022i113p2264-d850381.html 

2. Balyo, T., Heule, M., Iser, M., Jarvisalo, M., Suda, M. (eds.): Proceedings of SAT 
Competition 2022: Solver and Benchmark Descriptions. Department of Computer 
Science Series of Publications B, Department of Computer Science, University of 
Helsinki, Finland (2022) 

3. Becker, H., et al.: IsaFoL: Isabelle formalization of logic. https://bitbucket.org/ 
isafol/isafol/ 

4. Biere, A., Fazekas, K., Fleury, M., Heisinger, M.: CaDiCaL, Kissat, Paracooba, 
Plingeling and Treengeling entering the SAT Competition 2020. In: Balyo, T., 
Froleyks, N., Heule, M., Iser, M., Jarvisalo, M., Suda, M. (eds.) Proceedings of SAT 


218 


10. 


11. 


12. 


13. 


14. 


15. 


16. 


17. 


18. 


M. Fleury and P. Lammich 


Competition 2020 - Solver and Benchmark Descriptions. Department of Computer 
Science Report Series B, vol. B-2020-1, pp. 51-53. University of Helsinki (2020) 
Biere, A., Fazekas, K., Fleury, M., Heisinger, M.: CaDiCaL, Kissat, Paracooba, 
Plingeling and Treengeling entering the SAT Competition 2021. In: Proceedings of 
the SAT Competition 2021 - Solver and Benchmark Descriptions (2021). submitted 
Biere, A., Fleury, M.: Gimsatul, IsaSAT and Kissat entering the SAT Competition 
2022. In: Balyo, T., Heule, M., Iser, M., Jarvisalo, M., Suda, M. (eds.) Proceedings 
of the SAT Competition 2022 - Solver and Benchmark Descriptions. Department 
of Computer Science Series of Publications B, vol. B-2022-1, pp. 10-11. University 
of Helsinki (2022) 

Biere, A., Jarvisalo, M., Kiesl, B.: Preprocessing in SAT solving. In: Biere, A., 
Heule, M., van Maaren, H., Walsh, T. (eds.) Handbook of Satisfiability, Frontiers 
in Artificial Intelligence and Applications. 2nd edn, vol. 336, pp. 391-435. IOS 
Press (2021). https://doi.org/10.3233/FAIA200992 

Blanchette, J.C., Fleury, M., Lammich, P., Weidenbach, C.: A verified SAT solver 
framework with learn, forget, restart, and incrementality. J. Autom. Reason. 61(1- 
4), 333-365 (2018). https://doi.org/10.1007/s10817-018-9455-7 

Bulwahn, L., Krauss, A., Haftmann, F., Erkök, L., Matthews, J.: Imperative func- 
tional programming with Isabelle/HOL. In: Mohamed, O.A., Muñoz, C., Tahar, S. 
(eds.) TPHOLs 2008. LNCS, vol. 5170, pp. 134-149. Springer, Heidelberg (2008). 
https: //doi.org/10.1007/978-3-540-71067-7_14 

Cai, S., Zhang, X., Fleury, M., Biere, A.: Better decision heuristics in CDCL 
through local search and target phases. J. Artif. Intell. Res. 74, 1515-1563 (2022). 
https://doi.org/10.1613/jair.1.13666 

Fazekas, K., Biere, A., Scholl, C.: Incremental inprocessing in SAT solving. In: 
Janota, M., Lynce, I. (eds.) SAT 2019. LNCS, vol. 11628, pp. 136-154. Springer, 
Cham (2019). https: //doi.org/10.1007/978-3-030-24258-9 9 

Fleury, M.: Optimizing a Verified SAT Solver. In: Badger, J.M., Rozier, K.Y. (eds.) 
NFM 2019. LNCS, vol. 11460, pp. 148-165. Springer, Cham (2019). https: //doi. 
org/10.1007/978-3-030-20652-9_10 

Fleury, M.: IsaSAT and Kissat entering the EDA Challenge 2021 (2021). https:// 
www.eda-ai.org/results/, system description accepted at the EDA Challenge 2021. 
https: //m-fleury.github.io/ox-hugo/Fleury- EDA-Challenge-2021.pdf 

Haftmann, F., Nipkow, T.: Code generation via higher-order rewrite systems. In: 
Blume, M., Kobayashi, N., Vidal, G. (eds.) FLOPS 2010. LNCS, vol. 6009, pp. 
103-117. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12251- 
4.9 

Jarvisalo, M., Heule, M.J.H., Biere, A.: Inprocessing rules. In: Gramlich, B., Miller, 
D., Sattler, U. (eds.) IJCAR 2012. LNCS (LNAI), vol. 7364, pp. 355-370. Springer, 
Heidelberg (2012). https://doi.org/10.1007/978-3-642-31365-3_28 

Lammich, P.: Automatic data refinement. In: Blazy, S., Paulin-Mohring, C., 
Pichardie, D. (eds.) ITP 2013. LNCS, vol. 7998, pp. 84-99. Springer, Heidelberg 
(2013). https://doi-org/10.1007/978-3-642-39634-2_9 

Lammich, P.: Refinement to imperative HOL. J. Autom. Reason. 62(4), 481-503 
(2017). https: //doi-org/10.1007/s10817-017-9437-1 

Lammich, P.: Generating verified LLVM from Isabelle/HOL. In: Harrison, J., 
O’Leary, J., Tolmach, A. (eds.) 10th International Conference on Interactive The- 
orem Proving, ITP 2019, 9-12, September 2019, Portland, OR, USA. LIPIcs, vol. 
141, pp. 22:1-22:19. Schloss Dagstuhl - Leibniz-Zentrum fiir Informatik (2019). 
https://doi.org/10.4230/LIPIcs.ITP.2019.22 


A More Pragmatic CDCL for IsaSAT and Targetting LLVM (Short Paper) 219 


19. Lammich, P.: Efficient verified (UN)SAT certificate checking. J. Autom. Reason. 
64(3), 513-532 (2019). https: //doi.org/10.1007/s10817-019-09525-z 

20. Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analy- 
sis & transformation. In: International Symposium on Code Generation and Opti- 
mization, 2004. CGO 2004, pp. 75-88. IEEE (2004). https://doi.org/10.1109/cgo. 
2004.1281665 

21. Moskewicz, M.W., Madigan, C.F., Zhao, Y., Zhang, L., Malik, S.: Chaff: Engi- 
neering an efficient SAT solver. In: Proceedings of the 38th Design Automation 
Conference, DAC 2001, Las Vegas, NV, USA, 18-22 June 2001, pp. 530-535. ACM 
(2001). https: //doi.org/10.1145/378239.379017 

22. Nipkow, T., Wenzel, M., Paulson, L.C. (eds.): Isabelle/HOL: A Proof Assistant 
for Higher-Order Logic. LNCS, vol. 2283. Springer, Heidelberg (2002). https:// 
doi.org/10.1007 /3-540-45949-9 

23. Oe, D., Stump, A., Oliver, C., Clancy, K.: versat: a verified modern SAT solver. In: 
Kuncak, V., Rybalchenko, A. (eds.) VMCAI 2012. LNCS, vol. 7148, pp. 363-378. 
Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-27940-9_24 

24. Oh, C.: Between SAT and UNSAT: the fundamental difference in CDCL SAT. In: 
Heule, M., Weaver, S. (eds.) SAT 2015. LNCS, vol. 9340, pp. 307-323. Springer, 
Cham (2015). https: //doi.org/10.1007/978-3-319-24318-4_23 

25. Skotam, S.H.: CreuSAT - using rust and Creusot to create the world’s fastest 
deductively verified SAT solver. Master’s thesis, University of Oslo (2022). https:// 
www.duo.uio.no/handle/10852/96757 

26. Tan, Y.K., Heule, M.J.H., Myreen, M.O.: cake_lpr: verified propagation redun- 
dancy checking in CakeML. In: TACAS 2021. LNCS, vol. 12652, pp. 223-241. 
Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72013-1_12 

27. Weeks, S.: Whole-program compilation in MLton. In: Proceedings of the ACM 
Workshop on ML, 2006, Portland, Oregon, USA, 16 September 2006, p. 1. ACM 
Press (2006). https: //doi.org/10.1145/1159876.1159877 

28. Wetzler, N., Heule, M.J.H., Hunt, W.A.: DRAT-trim: efficient checking and trim- 
ming using expressive clausal proofs. In: Sinz, C., Egly, U. (eds.) SAT 2014. LNCS, 
vol. 8561, pp. 422-429. Springer, Cham (2014). https://doi.org/10.1007/978-3-319- 
09284-3_31 


Open Access This chapter is licensed under the terms of the Creative Commons 
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), 
which permits use, sharing, adaptation, distribution and reproduction in any medium 
or format, as long as you give appropriate credit to the original author(s) and the 
source, provide a link to the Creative Commons license and indicate if changes were 
made. 

The images or other third party material in this chapter are included in the 
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the 
material. If material is not included in the chapter’s Creative Commons license and 
your intended use is not permitted by statutory regulation or exceeds the permitted 
use, you will need to obtain permission directly from the copyright holder. 


®) 


Check for 
updates 


Proving Non-Termination by Acceleration 
Driven Clause Learning (Short Paper) 


Florian Frohn®)® and Jürgen Giesl®) © 


LuFG Informatik 2, RWTH Aachen University, Aachen, Germany 


florian.frohn@cs.rwth-aachen.de, giesl1@informatik.rwth-aachen.de 


Abstract. We recently proposed Acceleration Driven Clause Learning 
(ADCL), a novel calculus to analyze satisfiability of Constrained Horn 
Clauses (CHCs). Here, we adapt ADCL to transition systems and intro- 
duce ADCL-NT, a variant for disproving termination. We implemented 
ADCL-NT in our tool LoAT and evaluate it against the state of the art. 


1 Introduction 


Termination is one of the most important properties of programs, and thus 
termination analysis is a very active field of research. Here, we are concerned 
with disproving termination of transition systems (TSs), a popular intermediate 
representation for verification of programs written in more expressive languages. 


Example 1. Consider the following TS 7 with entry-point init and two further 
locations ¢,,€2 over the variables x,y,z, where x’, y’,z’ represent the values 
of x,y,z after applying a transition, and z, g++, and x-- abbreviate x’ = z, 
ve = «+1, and 2’ = x —1. The first two transitions are a variant! of 
chc-LIA-Lin_052 from the CHC Competition ’22 [7] and the last two are a 
variant? of flip2_rec.jar-obl-8 from the Termination and Complexity Com- 
petition (TermComp) [21]. 


init > 4 [æ < 0 Az > 5000Ay/ < z'] (7) 
li > h fy <2-zAat+A((x <2zAy)V (a> zAytt)) AZ] (Ta) 
li 3 bh|r=y^Ar>2-zAzA^y] (Tae) 
ly > b |r =yAr>O0ArAy—-] (T3) 
b > oz >0Ay>0Ar'=yA((z>y^y =2)V(a<yAy))] (T) 


1 We generalized the example to make it more interesting, and we added the condition 
y < 2. z to enforce termination of Te. 

2 We combined the transitions for the cases x > y and x < y into the equivalent tran- 
sition T to demonstrate how our approach can deal with disjunctions in conditions. 
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At 41, T operates in two “phases”: First, just x is incremented until x reaches 
z (1% disjunct of 7/,). Then, x and y are incremented until y reaches 2- z+ 1 
(2"4 disjunct of Te, ). If x = y = c holds for some c > 1 at that point (which is 
the case if x < y = z holds initially), then the execution can continue at 2 as 
follows: 


l2 (c, C, Cz) 


l2(c,ce— 1, cz) # f2(c—1,¢,cz) — _+ l2(c,¢,cz) — 


T 
£2 Lo Te, 


Te 


w Il 


Here, f2(c,c,c,) means that the current location is ¢2 and the values of x,y,z 
are c,c, cz. The 1% and 2" step with a satisfy the 1% (x > yA...) and 24 
(a < yA...) disjunct of ZA ’s condition, respectively. Thus, 7 does not terminate. 


Example 1 is challenging for state-of-the-art tools for several reasons. First, 
more than 5000 steps are required to reach 42, so reachability of £2 is difficult to 
prove for approaches that unroll the transition relation or use other variants of 
iterative deepening. Thus, chc-LIA-Lin_052 is beyond the capabilities of most 
other state-of-the-art tools for proving reachability. 

Second, the pattern “T, 1% disjunct of 7, , 24 disjunct of a ” must be found 
to prove non-termination. Therefore, flip2_rec.jar-ob1-8 (which does not use 
disjunctions) cannot be solved by other state-of-the-art termination tools. 

Third, Example 1 contains disjunctions, which are not supported by 
many termination tools. Presumably, the reason is that most techniques for 
(dis)proving termination of loops are restricted to conjunctions (e.g., due to the 
use of templates and Farkas’ Lemma). While disjunctions can be avoided by 
splitting disjunctive transitions according to the DNF of their conditions, this 
leads to an exponential blow-up in the number of transitions. 

We present an approach that can prove non-termination of systems like 
Example 1 automatically. To this end, we tightly integrate non-termination tech- 
niques into our recent Acceleration Driven Clause Learning (ADCL) calculus 
[16], which has originally been designed for CHCs, but it can also be used to 
analyze TSs. 

Due to the use of acceleration techniques that compute the transitive closure 
of recursive transitions, ADCL finds long witnesses of reachability automatically. 
If acceleration techniques cannot be applied, it unrolls the transition relation, so 
it can easily detect complex patterns of transitions that admit non-terminating 
runs. Finally, ADCL reduces reasoning about disjunctions to reasoning about 
conjunctions by considering conjunctive variants of disjunctive transitions. Thus, 
combining ADCL with non-termination techniques for conjunctive transitions 
allows for disproving termination of TSs with complex Boolean structure. 

After introducing preliminaries in Sect.2, Sect. 3 presents a straightforward 
adaption of ADCL to TSs. Section 4 introduces our main contribution: ADCL- 
NT, a variant of ADCL for proving non-termination. Finally, in Sect.5, we dis- 
cuss related work and demonstrate the power of our approach by comparing it 
with other state-of-the-art tools. All proofs can be found in [19]. 
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2 Preliminaries 


We assume familiarity with basics from many-sorted first-order logic. V is a 
countably infinite set of variables and A is a first-order theory over a k-sorted sig- 
nature X4 with carrier C4 = (C4i,...,Ca,~). QF(L’4) is the set of all quantifier- 
free first-order formulas over X4, which are w.l.o.g. assumed to be in negation 
normal form, and QF, (X4) only contains conjunctions of X 4-literals. Given a 
first-order formula 7 over X4, ø is a model of 7 (written o =4 n) if it is a model 
of A with carrier C4, extended with interpretations for V such that 77 is satisfied. 
As usual, =4 7 means that 7 is valid, and 7 =4 7’ means Ha n <=> Y. 

We write Z for sequences and z; is the i” element of 7. We use “::” for 
concatenation of sequences, where we identify sequences of length 1 with their 
elements, so we may write, e.g., x :: vs instead of [a] :: zs. 


Transition Systems. Let d € N be fixed, and let Z, 7’ € V? be disjoint vectors 
of pairwise different variables. Each y € QF(X4) induces a relation —>y on C4 
where 7 —+, t iff y[#/¥,Z’/E] is satisfiable. So for the condition w := (£ = y A 
x£ > OAZAy--) of Tj, we have (4,4,4) —y (4,3,7). £ 2 {init, err} is a finite set 
of locations. A configuration is a pair (¢,5) € L x C4, written ¢(5). A transition 
is a triple T = (€,u, V) € L x QF(X4) x £, written £ > 7 [v], and its condition 
is cond(r) := wv. W.Lo.g., we assume l Æ err and V Æ init. Then 7 induces 
a relation —>, on configurations where s —>, t iff s = ¢(s),t = L(t), and 
5 —y t. So, e.g., €2(4,4, 4) m7 bl2(4,3,7). We call Tr recursive if 2 = Z, 
conjunctive if y € QF (Xa), initial if € = init, and safe if V Æ err. Moreover, we 
define (€ > V [v])|y := £ — V IY]. A transition system (TS) T is a finite set 
of transitions, and it induces the relation —+7:= U er >r- 

Chaining T = ls > 4 [y] and 7’ = 4 — 4, |v] yields chain(7, 7’) := (£s > 
L, [We]]) where We := [2 /Z"] Ay" [E/T] for fresh Z” € V4 if 4 = £4, and pe := L 
(meaning false) if 0; A l. So —chain(r, r’) = —?r°— 7", and chain(72,—2,,77,) = 
l > b |y] where y =4 (z =yAx>2-zAx>0A2Ay-). For non-empty, 
finite sequences of transitions we define chain([7]) := 7 and chain([71, T2] :: 7) := 
chain(chain(7,72) :: 7). We lift notations for transitions to finite sequences via 
chaining. So cond(7) := cond(chain(T)), 7 is recursive if chain(7) is recursive, 
—>z = —chain(#), etc. If 7 is initial and cond(7 :: F) #4 L, then (7 :: 7) E€ TT 
is a finite run. T is safe if every finite run is safe. If there is a ø such that 
o 4 cond(7’) for every finite prefix 7’ of 7 € T”, then 7 is an infinite run. If 
no infinite run exists, then T is terminating. 


Acceleration. Acceleration techniques compute the transitive closure of rela- 
tions. In the following definition, we only consider relations defined by conjunc- 
tive formulas, since many existing acceleration techniques do not support dis- 
junctions [4], or have to resort to approximations in the presence of disjunctions 
[13]. 

Definition 2 (Acceleration). An acceleration technique is a function accel : 
QF,(24) = QFa(X4/) such that —} = —*accel(p), where A’ is a first- 
order theory. For recursive conjunctive transitions T, we define accel(T) := 


T |accel(cond(r)) : 
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+ 


So we clearly have —+7 = —accel(r). Note that most theories are not “closed 


under acceleration”. E.g., accelerating the Presburger formula x, = £1 +22/ To 
yields the non-linear formula n > 0 A a) = z1 +n- z2^ T9. If neither N nor Z 
are contained in C4, then an additional sort for the range of n is required in the 
formula that results from applying accel. Hence, Definition 2 allows A’ # A. 


3 ADCL for Transition Systems 


We originally proposed the ADCL calculus to analyze satisfiability of linear Con- 
strained Horn Clauses (CHCs) [16]. Here, we rephrase it for TSs, and in Sect. 4, 
we modify it for proving non-termination. The adaption to TSs is straightforward 
as TSs can be transformed into equivalent linear CHCs and vice versa (see, e.g., 
[10]). 

To bridge the gap between transitions 7 where cond(r) € QF(X4) and accel- 
eration techniques for formulas from QF, (X4), ADCL uses syntactic implicants. 


Definition 3 (Syntactic Implicants [16, Def. 6]). If y € QF(X’4), then: 


sip(q), o) := Nir is a literal of ù | o Ha T} ifo Hay 
TET ie ae 
): 

) = 


sip(T {rly |Y € ne for transitions T 


sip(T U sip(T for TSs T 
TET 


Here, sip abbreviates syntactic implicant projection. 


As sip(w, ø) is restricted to literals from y, sip(y) is finite. Syntactic implicants 
ignore the semantics of literals. So we have, e.g., (X > 1) ¢ sip(X > 0AX > 1) = 
{X >0AX > 1}. It is easy to show Y% =4 V sip()), and thus —>7 = —sip(T)- 

Since sip(7) is worst-case exponential in the size of cond(r), we do not com- 
pute it explicitly. Instead, ADCL constructs a run 7 step by step, and to per- 
form a step with 7, it searches for a model o of cond(7 :: T). If such a model 
exists, it appends T|gip(cond(7),0) to T. This corresponds to a step with a conjunc- 
tive variant of r whose condition is satisfied by ø. In other words, our calculus 
constructs sip(cond(r),o) “on the fly” when performing a step with 7, where 
o Ha cond(T :: T) 

The core idea of ADCL is to learn new, non-redundant transitions via accel- 
eration. Essentially, a transition is redundant if its transition relation is a subset 
of another transition’s relation. Thus, redundant transitions are not useful for 
(dis-)proving safety. 


Definition 4 (Redundancy, |16, Def. 8]). A transition T is (strictly) redun- 
dant w.r.t. T', denoted rT Er’ (T E T') if —, C v (—- C —>r). Fora 
TST, we haver CT TCT)if TET (TET) forsomer € T. 
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In the sequel, we assume oracles for redundancy, satisfiability of QF(X4)- 
formulas, and acceleration. In practice, we use incomplete techniques instead 
(see Sect. 5). 

From now on, let 7 be the TS that is being analyzed with ADCL. A state of 
ADCL consists of a TS S that augments T with learned transitions, a run 7 of 
S called the trace, and a sequence of sets of blocking transitions [B;]*_), where 
transitions that are redundant w.r.t. By, must not be appended to the trace. 

The following definition introduces the ADCL calculus. It extends the trace 
step by step (using the rule STEP, which performs an evaluation step with a 
transition) and learns new transitions via acceleration (ACCELERATE) whenever 
a suffix of the trace is recursive. To avoid non-terminating ADCL-derivations, 
our notion of redundancy from Definition 4 is used to backtrack whenever a 
suffix of the trace corresponds to a special case of another (learned) transition 
(COVERED). Moreover, BACKTRACK is used whenever a run cannot be contin- 
ued. A more detailed explanation of ADCL is provided after Definition 5. 
Definition 5 (ADCL [16, Def. 9, 10]). A state is a triple (S, [ri]#_,, [Bi]*-o) 
where S D T is a TS, bs B; C sip(S), and [r;]£_, € sip(S)*. The transitions 
in sip(Z) are called original and the transitions in sip(S) \ sip(Z) are learned. 
A transition Tp41 E Bp is blocked, and T,41 Z By is active if chain(([7,]*77) is 


an initial transition with satisfiable condition (i.e., [ri]*) is a run). Let 


bt(S, [rilf1, [Bo,---, Bal) = (S, [ri], [Bo,---, Be-1 U {7k }]) 
where bt abbreviates “backtrack”. Our calculus is defined by the following rules. 


T € sip(S) is active 


T ~ (T, |], [2]) (INIT) (S, z, B) ~ (S, F r, B Ø) (STEP) 
7° is recursive |7°|=|B°| accel(7°) =7 Z sip(S) 
(S, F = 7°, Bs: BO) ~ (SU {rt}, 727, B = {r}) (ACCELERATE) 
7’ C sip(S) or 7 Csip(S)A|7| >1 
s=(S,7::7", B) ~ bt(s) (COVERED) 
all transitions from sip(S) are inactive T is safe 
s = (S,7::7, B) ~ bt(s) (BACKTRACK) 
T is unsafe all transitions from sip(S) are inactive 
(S,7,B) ~ unsafe (REFUTE) (S, [], [B]) ~ safe (PROVE) 
We write = os, ... to indicate that the rule INIT, STEP, ... was used. STEP 


adds a transition to the trace. When the trace has a recursive suffix, ACCELER- 
ATE allows for learning a new transition which then replaces the recursive suffix 
on the trace, or we may backtrack via COVERED if the recursive suffix is redun- 
dant. Note that COVERED does not apply if 7” E sip(S) and |7"| = 1, as it could 
immediately undo every STEP, otherwise. If no further STEP is possible, BACK- 
TRACK applies. Note that BACKTRACK and COVERED block the last transition 
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from the trace so that we do not perform the same STEP again. If 7 is an unsafe 
run, REFUTE yields unsafe, and if the entire search space has been exhausted 
without finding an unsafe run (i.e., if all initial transitions are blocked), PROVE 
yields safe. 

The definition of ADCL in [16] is more liberal than ours: In our setting, 
ACCELERATE may only be applied if the learned transition is non-redundant, and 
our definition of “active transitions” enforces that the first transition on the trace 
is always an initial transition. In [16], these requirements are not enforced by the 
definition of ADCL, but by the definition of reasonable strategies [16, Def. 14]. 
For simplicity, we integrated these requirements into Definition 5. Additionally, 
COVERED should be preferred over ACCELERATE, and ACCELERATE should be 
preferred over STEP. 


Example 6. We apply ADCL to a version of Example 1 with the additional 
transition 


fo er|z=yAr>2-zArAyAZ. (Terr) 

a 2 
T> (T, LIB) > (T, [n talveccl[2,2,2]) (eS 1Az>5kAy<z 
S (Si, [n Tt 12,2, {rte }) (z<z^Az>5k^y<z 


S 
~> (Si, PTa ea Us isa [Ø, D, {T<}, 2]) 
(r=z+1Az>5k^y<z+1 


A d 
~ (So, Rite Spl [S, 2, eh 51) 
(ta >yAu>z>5kAy<2-z24+1 


one S2, ar E ed Ae) 
(z =2-2+1l=yAz>5k 


B unsafe 
Here, 5k abbreviates 5000 and: 
Warez =yS2-zAattAr<zAyAz Warez = YS 2-zAttt+AxL> zAyttAz 


Tics = h > h fy <2- zAn>0AT =2tnAcin<zaAyrg 


Te = oh lytn-1<2-zAn>0Aa =a24+nAe>zdAy =y+nAz] 
Sı := T bie} S2 := Sı U {7i} 


On the right, we show formulas describing the configurations that are reachable 
with the current trace. Every ~»-derivation starts with INIT. The first two STEPs 
add the initial transition 7, and an element of sip(Te,) to the trace. Since x < z 
holds after applying 7;, the only possible choice for the latter is T2,|y,-.. 

As Ta |y,- iS recursive, it is accelerated and replaced with accel(Te, |y.) = 
Ticz, which simulates n steps with Te, |y,,-.. Moreover, 7,{-, is also added to the 
current set of blocking transitions, as we always have —? C —>, for learned 
transitions 7 and thus adding them to the trace twice in a row is pointless. 
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Next, Te, is applicable again. As neither x < z nor x > z holds for all 
reachable configurations, we could continue with any element of sip(7,) = 
{Te ldece> Tei lve>. 7: We choose Tg, |y >z; SO that the recursive transition Tz |y,,, 
can be accelerated to Te Then Ter applies, and the proof is finished via 
REFUTE. 


For our purposes, the most important property of ADCL is the following. 
Theorem 7. If T ~* (S,7,B) and F is non-empty, then cond(7) #4 L and 
—>z C >. So if T ~»* unsafe, then T is unsafe. 


The other properties of ADCL that were shown in [16] immediately carry over 
to our setting, too: if 7 ~»* safe, then 7 is safe; if 7 is unsafe, then 7 ~»+* unsafe; 
in general, ~> does not terminate. The proofs are analogous to [16]. 


4 Proving Non-Termination with ADCL-NT 


From now on, we assume that the analyzed TS 7 does not contain unsafe tran- 
sitions. To prove non-termination, we look for a corresponding certificate. 
Definition 8 (Certificate of Non-Termination). Let 7 = £ — ¢[...]. A 
satisfiable formula ~ certifies non-termination of T, written Y =Ẹ T, if for any 
model o of w, there is an infinite sequence €(a(Z)) = 51 —r $2 —,... 


There exist many techniques for finding certificates of non-termination auto- 
matically, see Sect. 5. However, Definition 8 has several shortcomings. First, the 
problem of finding such certificates becomes very challenging if cond(7) contains 
disjunctions. Second, it is insufficient to consider a single transition when only 
non-singleton sequences 7 such that chain(7) is recursive admit non-terminating 
runs. Third, just finding a certificate w of non-termination for some 7 € T* 
does not suffice for proving non-termination of 7. Additionally, a proof that 
the pre-image of —+,,, is reachable from an initial configuration is required. 
All of these problems can be solved by integrating the search for certificates of 
non-termination into the ADCL calculus. 


Definition 9 (ADCL-NT). To prove non-termination, we extend ADCL with 
the rule NONTERM and modify COVERED as shown below. We write ~>nt for the 
relation defined by the (modified) rules from Definition 5 and NONTERM. 


FO is recursive TFC sip(S) or TO Csip(S) A |7°| >1 
s = (S,7:: 7, B) ~m bt(s) (COVERED) 
chain(7°) = 1> L[..] y HTO T= £> err [4] Z sip(S) 
(S, Z: 7°, B) ~nt (SU {7}, 7: 7O, B) (NONTERM) 


So the idea of NONTERM is to apply a technique which searches for a certifi- 
cate of non-termination to a recursive suffix of the trace. Apart from introducing 
NONTERM, we restricted COVERED to recursive suffixes. The reason is that back- 
tracking when the trace has a redundant, non-recursive suffix may prevent us 
from analyzing loops, resulting in a precision issue. 
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Example 10. Let T := {1,7/,7¢, Te } where 


7i= init > L|] 7; 


1 


' = init > l |T] m= lol] re := V] ep] 


and T means true. Due to the loop £ —,, V —>-, l T is clearly non- 
terminating. Without requiring that 7O is recursive in COVERED, T can be 
analyzed as follows: 


T Sone (T, 1 [2)) ne (T, [ny 7], 2°) Soe (T, brs 12, {70})) one (Toh end) 


Sal fol, tele{n} 22) Some (Tiled, en}, fre H) Bane (Tol [res 72H) Zone sate 


The 1% application of COVERED is possible as [7;, 77] E 7/ and the 2”4 application 
of COVERED is possible as [r;, re] E 7;. Note that the trace never contains both 
Te and Te, but both transitions are needed to prove non-termination. 


Recall the shortcomings of Definition 8 mentioned above. First, due to the 
use of syntactic implicants, ADCL-NT reduces reasoning about arbitrary transi- 
tions to reasoning about conjunctive transitions. Second, as NONTERM considers 
a suffix FO of the trace, it can prove non-termination of sequences of transitions. 
Third, ADCL’s capability to prove reachability directly carries over to our goal of 
proving non-termination. So in contrast to most other approaches (see Sect. 5), 
ADCL-NT does not have to resort to other tools or techniques for proving reach- 
ability. 

We only search for a certificate of non-termination for FO if ADCL-NT estab- 
lished reachability of the pre-image of —> 70 beforehand. Note, however, that 
this does not imply reachability of the pre-image of —+_errjy], as W entails 
cond(7©), but not the other way around. Hence, we cannot directly derive non- 
termination of 7 when NONTERM applies. Regarding the strategy for ~>nt, one 
should try to use NONTERM once for each recursive suffix of the trace. 


Example 11. Reconsider Example 1. Up to (excluding) the second-last step, the 
derivation from Example 6 remains unchanged. Then we get 


(Sa, RB es ele bee) (£ > y Ax > 5k) 
s4 +o + -= 4 # _ 
~~ nt (S2, mee et Sx) Thre TH TH, theses AEE [. : J) (1 =2 Y5 T > 10k) 
N Ea = 
nt (S3; [ti Takes Toa Thala) Te) Tey bay) Th ecyls e+ ]) (1 =2 y = @ > 10k) 
S + $ R 
~ nt (S3, are Taasi Tr>z) Tb» Thos Fico liseyi ten, |: : J) ~>nt unsafe 


where Yr>y:=£>OAY>OALT SYAT>YNAY =T Ter i= b > err |z =y > 1] 
Percy =T > OAY > OAT =yArK<yAY S3 := S2 U {Terr} 


The formulas on the right describe the values of x and y that are reachable with 
the current trace, where 1 => y means that y is odd. After the first STEP with 
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Thb, just Tj, can be used, as cond(tv,.¢,) implies x’ = y’. While r% is recur- 

sive, ACCELERATE cannot be applied next, as ye = — Tt, so the learned 
2 Lo 

transition would be redundant. Thus, we continue with 7, , projected to x > y 

(as cond(7;;) implies x’ = y’ +1). Again, all transitions that could be learned are 


# 


redundant, so ACCELERATE does not apply. We next use T/, projected to x < y, 


2 
as the previous STEP swapped x and y. As the suffix [7;,, 7 hiss Te loss] of 
the trace does not terminate (see Example 1), NONTERM applies. So we learn 
the transition Terr, which is added to the trace to finish the proof, afterwards. 


Theorem 12. If T ~~, unsafe, then T does not terminate. 


While Theorem 12 establishes the soundness of our approach, we now inves- 
tigate completeness. In contrast to ADCL for safety (Sect. 3), ADCL-NT is not 
refutationally complete, but the proof is non-trivial. So in the following, we show 
that there are non-terminating TSs 7 where 7 “*, unsafe. To prove incomplete- 
ness, we adapt the construction from the proof that ADCL does not terminate 
[16, Thm. 18]. There, states (S,7, B) were extended by a component £ that maps 
every element of sip(S) to a regular language over sip(7 ). However, the proof of 
(16, Thm. 18] just required reasoning about finite (prefixes of infinite) runs, but 
we have to reason about infinite runs. So in our setting £ maps each element T 
of sip(S) to a regular or an w-regular language over sip(T), i.e., £(7) C sip(T)* 
or L(T) C sip(T)”. We lift £ from sip(S) to sequences of transitions as follows. 


L(e) = € L(T =: T) := L(F) =: L(T) if Lr) C sip(r)* 


Here, “::” denotes language concatenation (ie., Li :: Lo = {n = T | m E€ 
L1,T2 E€ L2}) and we only consider sequences where £(r) is regular (not w- 
regular) to ensure that £ is well defined. So while we lift other notations to 
sequences of transitions via chaining, £(7) does not stand for £(chain(7)). 


Definition 13 (ADCL-NT with Regular Languages). We extend states 
by a fourth component L, and adapt INIT, ACCELERATE, and NONTERM as 
follows: 


L(r) = {r} for all T € sip(T) 
T ~n (T, [], [2], £) (INIT) 


FO is recursive |7°| = |B°| accel(7°) = 7 Z sip(S) 

(S,7:: 7°, B x BO,L) ~m (SU{t}, F £ r, B £ {r}, £w (T = L(F°)*)) (ACCELERATE) 
chain(7°) = 4 — 4L[..] YER TO T= £> err [y] Z sip(S) 
}, 


(S,7:: 7°, B, L) ~m (SU {7}, 72 7, B, Lw (T L(F)*)) (NONTERM) 


All other rules from Definition 5 leave the last component of the state unchanged. 


Here, La) = Unen- 
all words that result from concatenating infinitely many elements of L(r) \ {e}. 

In ACCELERATE and NONTERM, chain(7°) is recursive. Thus, FO does not 
contain unsafe transitions. Hence, £(7°) and thus also £(7©)* are well defined 


L(t)”, and L(7)” is the w-regular language consisting of 
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and regular, and £L(7°)” is w-regular. Moreover, the use of “W” is justified by 
the condition T Z sip(S). The next lemma states two crucial properties about 
L. 


Lemma 14. Assume T ~»*, (S, 7, B, £) and let r = (€ > L [w]) € sip(S). 


e If L(T) Csip(T)*, then —; = Uzeti) 7: 
e If L(r) Csip(T)”, then for every model o of y, there is an infinite sequence 
€(a(2)) = $1 —7, $2 —n ... where [T1,72,...] € L(T). 


Based on this lemma, we can prove that our extension of ~»,; from Definition 13 
is not refutationally complete. Then refutational incompleteness of ADCL-NT as 
introduced in Definition 9 follows immediately. The reason is that £ is only used 
in the premise of INIT in Definition 13, but there the requirement “L(T) = {7} 
for all r € sip(T)” is trivially satisfiable by choosing £ accordingly. 


Theorem 15. There is a non-terminating TS T such that T 4%, unsafe. 


Proof (Sketch). As in the proof of [16, Thm. 18], for any (original or learned) 
transition 7 such that £(r) is regular, £(7) contains at most one square-free word 
(i.e., a word without a non-empty infix w :: w). Thus, if £(7) is w-regular, then 
L(T) does not contain an infinite square-free word. Moreover, as in the proof 
of [16, Thm. 18], one can construct a TS T that admits a single infinite run 7, 
and this infinite run is square-free. Thus, there is no transition 7 such that £(r) 
contains a suffix of 7, i.e., no ~»,;-derivation starting with 7 corresponds to 7. 
Hence, by Lemma 14, assuming 7 ~~}, unsafe results in a contradiction. 


Since ADCL can prove unsafety as well as safety, it is natural to ask if there 
is a dual to ADCL-NT that can prove termination. The most obvious approach 
would be the following: Whenever the trace has a recursive suffix 7©, then termi- 
nation of FO needs to be proven before the next ~»-step. The following example 
shows that this is not enough to ensure that 7 ~~} safe implies termination of 
T. 


Example 16. Let T := {7 = init > L[yi]} U {Tm = £ > l[vn] | 0 < m < 2} 
and 


wie =0 pora =1 pi =r lAr =2 wre x=2Aa' =l. 
As we have ¢(1) —,, ¢(2) — (1), T is clearly non-terminating. We get: 


Tba TE Sone (T, fri, 705 71,24) Boe (Sr, fr, 701], B? tah 


c 
nt (S2, [Ti, 7012, T1], B? z: {T01, T012} 1: Ø) nt (S2, [ni, T012], 2’ :: {701, 7012, T1 }) 


( 
Ont (Si, [n, 701, 72], B? = {701} =: Ø) Snt (S2, [n, T012], B? :: {701, T012}) 
( 
nt (S2, [n], Ø :: {to12}) 4 (Sa, [n], Ø :: {7012, To; T01}) Snt (Sa, [], [{7:}]) Ant safe 


After three STEPs, we accelerate the recursive suffix |[To, T1] of the trace, resulting 
in To = l > La = 0A x’ = 2] and Sı = TU {T01}. After one more step, [701, 72] 
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is accelerated to T12 = l —> Llr = 0 Ax’ = 1] and we get Sp = Sı U {T012}. 
After the next step, [To12, 71] is redundant w.r.t. 791, So COVERED applies. Then 
we BACKTRACK, as no other transitions are active. The next STEPs also yield 
states that allow for backtracking (as their traces have the redundant suffixes 
[70, Tı] and [791,72]), so we can finally apply BACKTRACK again and finish with 
PROVE. 

Note that whenever the trace has a recursive suffix, then it leads from (i) 
to €(j) where i Æ j, i.e., each such suffix is trivially terminating. In particular, 
the cycle (1) —>,, (2) —,, €(1) is not apparent in any of the states. 


This example reveals a fundamental problem when adapting ADCL for prov- 
ing termination: ADCL ensures that all reachable configurations are covered, 
which is crucial for proving safety, but there are no such guarantees for all runs. 
Therefore, we think that adapting ADCL for proving termination requires major 
changes. 


5 Related Work and Experiments 


We presented ADCL-NT, a variant of ADCL for proving non-termination. The 
key insight is that tightly integrating techniques to detect non-terminating tran- 
sitions into ADCL allows for handling classes of TSs that are challenging for 
other techniques. In particular, ADCL-NT can find non-terminating executions 
involving disjunctive transitions or complex patterns of transitions. Moreover, 
it tightly couples the search for non-terminating configurations and the proof of 
their reachability, whereas other approaches usually separate these two steps. 


Related Work. There are many techniques to find certificates of non- 
termination [2,14, 15,22,23,25]. We could use any of them (they are black boxes 
for ADCL-NT). 

Most non-termination techniques for TSs first search for non-terminating 
configurations, and then prove their reachability [5,6,9,22], or they extract and 
analyze lassos [23]. In contrast, ADCL-NT tightly integrates the search for non- 
terminating configurations and reachability analysis. 

Earlier versions of our tool LoAT [12,15] also interleaved both steps using a 
technique akin to the state elimination method to transform finite automata to 
regular expressions. This technique cannot handle disjunctions, and it is incom- 
plete for reachability. Hence, LoAT is now solely based on ADCL-NT. 


Implementation. So far, our implementation in our tool LoAT is restricted to 
integer arithmetic. It uses the technique from [15] for acceleration and finding 
certificates of non-termination, the SMT solvers Z3 [26] and Yices [11], the recur- 
rence solver PURRS [1], and libFAUDES [24] to implement the automata-based 
redundancy check from [16]. 


Experiments. To evaluate our implementation in LoAT, we used the 1222 Inte- 
ger Transition Systems (ITSs) and the 335 C Integer Programs from the Ter- 
mination Problems Database [28] used in TermComp [21]. The C programs are 


Proving Non-Termination by Acceleration Driven Clause Learning 231 


small, hand-crafted examples that often require complex proofs. The ITSs are 
significantly larger, as they were obtained from automatic transformations of C 
or Java programs. Moreover, they contain a lot of “noise”, e.g., branches where 
termination is trivial or variables that are irrelevant for (non-)termination. Thus, 
they are well suited to test the scalability and robustness of the tools. 

We compared our implementation (LoAT ADCL) with other leading termina- 
tion analyzers: iRankFinder [2,9], T2 [6], Ultimate [8], VeryMax [3,22], and the 
previous version of LoAT [15] (LoAT '22). For T2, VeryMax, and Ultimate, we 
took the versions of their last TermComp participations (2015, 2019, and 2022). 
For iRankFinder, we used the configuration from the evaluation of [15], which 
is tailored towards proving non-termination. We excluded AProVE [20], as it 
cannot prove non-termination of ITSs, and it uses LoAT and T2 as backends 
when analyzing C programs. Moreover, we excluded Ultimate from the evaluation 
on ITSs, as it cannot parse them. All experiments were run on StarExec [27] with 
300 s wallclock timeout, 1200 s CPU timeout, and 128 GB memory limit per 
example. 


No Yes Runtime overall Runtime No 
solved|unique]|solved||average|median|timeouts||average|median 

LoAT ADCL]} 521 9 0 48.6 s | 0.1s 183 2.9s | Ols 

LoAT '22 494 0 74s | O.1s 0 6.2s | Ols 

T2 442 615 || 17.2s | 0.6s 45 74s | 06s 


VeryMax 421 
iRankFinder || 409 


631 || 28.3s | 0.5s 30 30.5 s | 14.5 s 
642 || 32.0s | 2.0s 93 12.3s | 1.7s 


S|] BD] Ww] wo 


The table above shows the results for ITSs, where the column “unique” contains 
the number of examples that could be solved by the respective tool, but no others. 
It shows that LoAT ADCL is the most powerful tool for proving non-termination 
of ITSs. The main reasons for the improvement are that LoAT ADCL builds upon 
a complete technique for proving reachability (in contrast to, e.g., LoAT '22), and 
the close integration of non-termination techniques into a technique for proving 
reachability, whereas most competing tools separate these steps from each other. 

If we only consider the examples where non-termination is proven, LoAT 
ADCL is also the fastest tool. If we consider all examples, then the average 
runtime of LoAT ADCL is significantly slower. This is not surprising, as ADCL- 
NT does not terminate in general. So while it is very fast in most cases (as 
witnessed by the very fast median runtime), it times out more often than the 
other tools. 

For C integer programs, the best tools are very close (VeryMax: 103xNo, 
LoAT ADCL: 102xNo, Ultimate: 100xNo). Regarding runtimes, the situation is 
analogous to ITSs. See [18] for detailed results, more information about our 
evaluation, and a pre-compiled binary. LoAT is open-source and available on 
GitHub [17]. 
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Abstract. There is a wide range of modal logics whose semantics goes 
beyond relational structures, and instead involves, e.g., probabilities, 
multi-player games, weights, or neighbourhood structures. Coalgebraic 
logic serves as a unifying semantic and algorithmic framework for such 
logics. It provides uniform reasoning algorithms that are easily instan- 
tiated to particular, concretely given logics. The COOL 2 reasoner pro- 
vides an implementation of such generic algorithms for coalgebraic modal 
fixpoint logics. As concrete instances, we obtain in particular reason- 
ers for the aconjunctive and alternation-free fragments of the graded 
p-calculus and the alternating-time p-calculus. We evaluate the tool 
on standard benchmark sets for fixpoint-free graded modal logic and 
alternating-time temporal logic (ATL), as well as on a dedicated set of 
benchmarks for the graded p-calculus. 


1 Introduction 


Modal and temporal logics are established tools in the specification and verifica- 
tion of systems. While many such logics are interpreted over relational transition 
systems, the semantics of quite a number of important logics goes beyond the 
relational setup, involving, for instance, probabilities [20,30], concurrent games 
as in alternating-time logics [1,36], monotone neighbourhoods structures as in 
game logic [34] and concurrent dynamic logic [37], or integer transition weights as 
in the multigraph semantics [5] of the graded p-calculus [25]. Coalgebraic logic [4] 
provides a uniform semantic and algorithmic framework for these logics, based 
on the paradigm of universal coalgebra [38]. It provides reasoning algorithms 
of optimal complexity at various levels of expressiveness, up to the coalgebraic 
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p-calculus [3,21-23]. These algorithms are parametric in the transition type of 
systems (weighted, probabilistic, game-based etc.) as well as in suitable choices 
of modalities specific to the given system type. Their instantiation to specific 
logics requires providing either a set of next-step modal tableau rules satisfying 
a suitable completeness criterion [41] or, more generally, a plug-in algorithm that 
determines satisfiability for an extremely simple one-step logic that describes the 
interaction between modalities, and consists of (conjunctions of) modal opera- 
tors applied to variables only [29]. 

The COalgebraic Ontology Logic solver (COOL) provides reasoning support 
for coalgebraic logics based on these generic algorithms. The first version of the 
tool [15] provided reasoning support for fixpoint-free coalgebraic hybrid logic 
with global assumptions, using a global caching principle [13]. In the present 
paper, we present COOL 2, which provides reasoning support for coalgebraic fix- 
point logics, specifically for both the aconjunctive fragment and the alternation- 
free fragment of the coalgebraic p-calculus. By instantiation, we obtain in par- 
ticular the first implemented reasoners for the graded -calculus [26] (for which 
a set of coalgebraic modal tableau rules has been described in the literature [41]; 
however, this rule set has later turned out to be incomplete, cf. Remark 2.3) 
and the alternating-time -calculus [1]. We describe the structure of the tool 
including implementational details, and present evaluation results, focusing on 
the graded p-calculus and alternating-time temporal logic (ATL). Additional 
details on the evaluation can be found in the full version [17]. 


Related Work: We have already mentioned work in coalgebraic logic on which 
COOL is based [3, 13, 21—23,41]. COOL is conceptually a successor of the Coal- 
gebraic Logic Satisfiability Solver (CoLoSS) [2] but does not share any of its 
code. CoLoSS implements fixpoint-free logics, and is entirely unoptimised. The 
first version of COOL [15] has been evaluated on fixpoint-free next-step logics. 

COOL does cover also various relational modal logics, for which there are 
numerous specialised reasoners, including highly optimised description logic rea- 
soners such as FaCT++ [44], Pellet [42], RACER [18], and HermiT [12]. As these 
systems do not support fixpoint logics, a comparison would be of limited value. 
In previous work, COOL has been evaluated on various relational fixpoint log- 
ics, and has been shown to perform favourably on Computation Tree Logic [23] 
(in comparison to reasoners featured in a previous systematic evaluation [14]), 
as well as on the aconjunctive fragment of the modal p-calculus [22] (in com- 
parison to MLSolver [11]). A reasoner for (next-step) graded modal logic has 
been evaluated against various description logic reasoners [43], using however 
the above-mentioned incomplete set of modal tableau rules. 

For the same reasons, we refrain from evaluating COOL 2 against reason- 
ers for coalition logic, i.e. the fixpoint-free fragment of the alternating-time u- 
calculus, such as CLProver [32]. The only implemented reasoner for any fragment 
of the alternating-time p-calculus that does include fixpoints still appears to be 
the tableau reasoner TATL for alternating-time temporal logic [6,7]. TATL has 
been compared to COOL on random formulas in previous work [23]. 
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2 Satisfiability in the Coalgebraic -Calculus 


COOL 2 is a satisfiability checker for the coalgebraic pi-calculus [3], that is, for 
the extension of coalgebraic modal logic with extremal fixpoint operators. For- 
mulas of this logic are interpreted over coalgebras, where the semantics of modal 
operators is defined by means of so-called predicate liftings [41]; we recapitulate 
examples of system types and modalities subsumed by this paradigm in Example 
AN 


Syntaz: Formulas are built relative to a set Var of fixpoint variables and a modal 
similarity type A, that is, a set of modal operators with assigned finite arities 
that is closed under duals, with Y € A denoting the dual of Y € A. Formulas 
w,o,... of the coalgebraic p-calculus over A are given by the grammar 


where Ọ € A has arity n and X € Var. A formula x is aconjunctive if for every 
conjunction A @ that is a subformula of x, at most one of the formulas w and ¢ 
contains a free fixpoint variable X that is bound by a least fixpoint operator uX. 
While the logic does not contain negation as an explicit operator, full negation 
can be defined as usual; e.g. we have —0y = Yaw and mu X. y = vX. 7 [X/X], 
using nX = X. 

Both the theoretical satisfiability checking algorithm and its implementa- 
tion in COOL 2 operate on the Fischer-Ladner closure [21,24,27] of the target 
formula. The alternation depth (e.g. [21,29,33]) of a formula is the maximum 
depth of dependent alternating nestings of least and greatest fixpoints within 
the formula. Formulas with alternation depth 1 are alternation-free. 


Semantics: Formulas are interpreted over F-coalgebras, that is, structures 
(C,E:C > FC), 


where F: Set — Set is a functor determining the branching type of the systems at 
hand; thus (x) € FC encodes the transitions from x € C, structured according 
to F. Modalities 9 € A of arity n are interpreted as predicate liftings, that is, 
families of maps [Y]y: (27)" — 2¥¥ (for U € Set) that assign predicates on FU 
to n-tuples of predicates on U, subject to a naturality condition [35,40]. On a 
coalgebra (C,&), the semantics of formulas is defined inductively in the usual 
way for the propositional operators and fixpoints, and by [O(d1,...,Un)] = 
EV] c(fvil, ---; [Yn])] for modalities. 

A closed formula w is satisfiable if there is a coalgebra (C,€) and a state 
x E€ C such that x € [y]. A formula ¢ is valid if ~y is not satisfiable. 


Example 2.1.(1) The standard modal p-calculus [24] is obtained using the 
functor F = P(A) x P, where A is a fixed set of atoms, the similarity type 
A= 1{0,0,a, ~a |a € A}, and predicate liftings 


[O]e(B) = {(A, Z) € 24 x 2°| ZN BED} — [alo = {(A,Z) € 24 x 2% Jae A} 
[Ce(B) = {(A, Z) € 24 x 2° | Z C B} [-a]o = {(A, Z) € 2% x 2° |a ¢ A} 
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The expressive power of the modal p-calculus is demonstrated by the for- 
mulas 


pX.VvY.(pAOY) VOX vX. uY. (pA OX) V OY. 


The former is a co-Büchi formula expressing the existence of a path on 
which p holds forever, from some point on; the latter formula expresses the 
Büchi property that there is a path on which the atom p is satisfied infinitely 
often. 

(2) The graded u-calculus [26] allows expressing quantitative properties with 
the help of modal operators (n) and [n], n € N; formulas (n)w and [n]q then 
have the intuitive meaning that ‘there are more than n successor states that 
satisfy w’, and ‘all but at most n successor states satisfy w’, respectively. 
Its coalgebraic interpretation is based on multigraphs, which are coalgebras 
for the multiset functor [5]. A graded variant of the above Biichi property 
is specified, e.g., by the formula vX. uY. (p A (n)X) V (n)Y, which expresses 
the existence of an infinite n + l-ary tree such that the atom p is satisfied 
infinitely often on every path in the tree. 

(3) The alternating-time u-calculus (AMC) [39] extends coalition logic [36] with 
fixpoints and (modulo syntax) supports modalities (D) and [D], where D C 
N is a coalition formed by agents from the set N = {1,...,n} for some fixed 
n € N; formulas (D)w and [D]w then state that ‘coalition D has a joint 
strategy to enforce wv’ and that ‘coalition D cannot prevent w’, respectively. 
For instance, the formula vX. uY. vZ. (pA (D)X) V (q^ (DY) V (aq (D)Z) 
expresses that coalition D has a joint multi-step strategy that guarantees 
that p is visited infinitely often whenever q is visited infinitely often. 


Satisfiability Checking: We proceed to recall the satisfiability checking algorithm 
for the coalgebraic y-calculus that forms the basis of the implementation within 
COOL 2. This algorithm adapts the automata-based approach to satisfiability 
checking for the standard p-calculus, and generalises the treatment of modal 
steps by parametrizing over a solver for the one-step satisfiability problem of the 
logic, which concerns satisfiability of formulae with exactly one layer of next-step 
modalities [21]. It thus avoids the necessity of tractable sets of tableaux rules 
for modal operators. Under mild assumptions on the complexity of the one-step 
satisfiability problem of the base logic at hand (‘tractability’), the algorithm 
witnesses a, typically optimal, upper bound EXPTIME for the complexity of 
the satisfiability problem; unlike a previous algorithm [4], the algorithm thus 
has optimal runtime also in cases where no tractable sets of modal tableaux 
rules are known, such as the graded (or, more generally, Presburger) p-calculus 
(further cases of this kind include the probabilistic -calculus with polynomial 
inequalities [21] and the unrestricted form of the alternating-time p-calculus with 
disjunctive explicit strategies [16}). 

The algorithm constructs and solves a parity game that characterises satis- 
fiability of the input formula x. In this game one player attempts to construct 
a tableau structure for x while the opposing player attempts to refute the exis- 
tence of such a structure. Modal steps in this tableau construction are treated 
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by using instances of the one-step satisfiability problem for the logic at hand, 
thereby generalising traditional modal tableau rules. The winning condition of 
the game is encoded by a non-deterministic parity automaton A,, reading infi- 
nite words that encode sequences of step-wise formula evaluations (so-called for- 
mula traces) within a coalgebra; such words encode branches in the constructed 
tableau structure. Conjunctions give rise to nondeterminism in this automaton, 
and the parity condition of the automaton is used to accept exactly those words 
that encode sequences of formula evaluations in which some least fixpoint is 
unfolded infinitely often. To use the language accepted by A, as the winning 
condition in a parity game, we transform A, to an equivalent deterministic par- 
ity automaton B,. This automaton then is paired with the tableau construction 
to yield a parity game in which the existential player aims to show the existence 
of a tableau structure in which all branches are rejected by B,, and that is built 
in such a way that modalities always are jointly one-step satisfiable. To ensure 
the latter property, the modal moves in the game invoke instances of the one-step 
satisfiability problem of the base logic. For more details on one-step satisfiability 
and the overall algorithm, see [17,21]. 


Corollary 2.2 ([21]). Suppose that the one-step satisfiability problem is 
tractable. Then the satisfiability problem of the corresponding instance of the 
coalgebraic -calculus is in EXP'TIME. 


Remark 2.3. As mentioned above, previous algorithms for the coalgebraic 
p-calculus (also implemented in COOL 2) rely on complete sets of modal 
tableau rules, specifically on one-step cutfree complete sets of so-called one- 
step rules [41]; such rules (in their incarnation as tableau rules) have a premiss 
with exactly one layer of modal operators and a purely propositional conclu- 
sion. A typical example is the usual tableau rule for the modal logic K: ‘To 
satisfy Na; A--- A Oan A ~ao, satisfy a, A+++ A Gn A nag’. It has been shown 
that the existence of a tractable one-step cutfree complete set of one-step rules 
implies tractability of one-step satisfiability [29], i.e. the approach via one-step 
satisfiability is more general. 

As indicated in the introduction, a tractable one-step cutfree complete set of 
one-step rules for graded modal logic has been claimed in the literature [41,43] 
but has since turned out to be incomplete; we give a counterexample in the full 
version [17]. (A similar rule for Presburger modal logic [28] has also been shown 
to be in fact incomplete [29].) 


3 Implementation 


The previous version COOL [15] only implements fixpoint-free (coalgebraic) log- 
ics, such as standard modal logic, probabilistic modal logic, or coalition logic. 
The main novelty of the new version COOL 2, described here, is 


— the addition of fixpoint constructs to the previously implemented logics, sup- 
porting alternation-free and aconjunctive fragments of the resulting pi-calculi, 
and implementing on-the-fly solving to allow early termination 
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— support for treating modal steps both by tableaux rules (when a suitable rule 
set exists), and by one-step satisfiability checking (in the remaining cases) 


In more detail, COOL 2 is written in OCaml and implements the satisfiabil- 
ity checking algorithm described in Sect.2, treating modal steps by solving 
instances of the one-step satisfiability problem!. For logics where a suitable set of 
modal tableau rules is implemented, those are used for the treatment of modal 
steps, rather than relying on one-step satisfiability (unless the user explicitly 
chooses otherwise); in these cases, COOL 2 essentially implements the algo- 
rithm described in [29]. The current implementation supports the alternation- 
free and the aconjunctive fragments of the standard p-calculus (both serial and 
non-serial), the monotone p-calculus [19], the alternating-time ji-calculus (i.e. 
coalition logic with fixpoint operators), and the graded p-calculus. Tractable 
tableaux rules are available for all cases except for the graded p-calculus, for 
which COOL 2 uses the one-step satisfiability algorithm to decide satisfiability. 
In particular, COOL 2 is the only existing reasoner for the graded p-calculus (as 
well as the only reasoner covering the alternating-time p-calculus beyond ATL). 

The concrete logic used can be selected via a command-line parameter set- 
ting up the data structures in COOL 2 accordingly before parsing and check- 
ing the syntax of the given formula x. COOL 2 then builds the determinised 
automaton B,, yielding the parity game described above in a step-wise man- 
ner, repeatedly adding nodes in expansion steps that explore the game. In the 
case of simpler alternation-free formulas, the Miyano-Hayashi method [31] is 
used to construct By, resulting in asymptotically smaller games with a Biichi 
winning condition; for the more involved aconjunctive formulas, the implemen- 
tation uses the permutation method for determinisation of limit-deterministic 
parity automata [9,22]. Nodes in the constructed game are marked as either 
unexpanded, undecided, unsatisfiable, or satisfiable. 

Optional solving steps may take place at any point during the construction of 
B,, depending on runtime parameters of COOL 2; these steps compute the win- 
ning regions of the partial game that has been constructed so far and accordingly 
mark nodes as satisfiable or unsatisfiable, if possible. The reasoner terminates 
as soon as the initial node is marked satisfiable or unsatisfiable. If this does 
not allow for early termination, the game eventually becomes fully explored, at 
which point a final (obligatory) solving step for the complete game is guaranteed 
to mark the initial node, thereby ensuring termination. 

We detail the implementation of the two main procedures within COOL 2. 


Implementation of Expansion Steps. The propositional expansion steps in the 
game construction for nodes v are performed using the propositional satisfiabil- 
ity solver MiniSat [8] to compute a word that encodes consistent propositional 
formula manipulations for v. Afterwards, the successor of v in By under this 
word is computed and added to the game. 

When the one-step satisfiability based algorithm of COOL 2 is used, modal 
expansion steps for nodes v create new game nodes for each subset « of the 


' Sources are available at https://git8.cs.fau.de/software/cool. 
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modalities that are to be jointly satisfied at v; this is done by computing the 
successor of v in By that is reached by manipulating each formula from «. 

When the tableau-based algorithm of COOL 2 is used, the modal expansion 
step for a node v instead computes all applications of a modal rule matching v 
and inserts, for each such rule application, and each conjunctive clause « in the 
conclusion of the rule application, the new game node that is reached from v 
in By by manipulating the modalities that constitute «. Intuitively, using tableau 
rules reduces the search space by only adding nodes found in the conclusion of 
some matching rule application. 

Any node that is added by some expansion step is initially marked as unde- 
cided. Crucially, all expansion steps perform on-the-fly determinisation, that is, 
given a game node v and a word that encodes a sequence of formula manipula- 
tions, the newly added game node is computed using only the information stored 
in v. 


Implementation of Solving Steps. A single solving step computes the winning 
regions in the parity game that has been constructed up to this point, and marks 
nodes accordingly. The game solving is done using either the parity game solver 
PGSolver [10] or a native implementation provided by COOL 2 that solves the 
game by fixpoint iteration. 

If the one-step satisfiability-based algorithm is used, an assigned modal node 
v is satisfiable if its modalities are jointly one-step satisfiable in those successors 
of v that are satisfiable themselves. An enumerative representation of the game 
thus contains existential moves to all subsets IT of subsets of modalities of v that 
are sufficiently large for one-step satisfaction of the modalities of v, followed 
by universal moves to nodes induced by any « € JJ; the full game thus is of 
doubly-exponential size. This can be avoided by inlining the modal steps, thereby 
evading the intermediate nodes J. The winning region can then be computed 
in single-exponential time by using COOL 2’s native fixpoint iteration over a 
function that computes the two-tiered modal steps in one go. 

Decision procedures for the one-step satisfiability problems in the relational 
and the graded case are implemented in COOL 2 along the lines of the algorithms 
described in [21, Example 6] (in the graded case, nondeterministic guessing is 
replaced with a recursive search procedure). 

If the algorithm based on modal tableau rules is used, the treatment of modal 
steps follows the tableaux-based algorithm that is given in [3]. States v are sat- 
isfiable if for all rule applications that match v, the conclusion of the application 
contains a conjunctive clause « such that the node induced by « is satisfiable. 

COOL 2 also allows the user to specify the desired frequency of optional 
game solving steps, including the options once and adaptive. With the option 
once, no intermediate solving takes place so that the game is fully constructed 
and solved just once, at the very end of the execution. With the option adaptive, 
intermediate solving takes places, but the frequency of solving reduces as the size 
of the constructed graph increases; this option implements on-the-fly solving and 
allows for finishing early in cases where a small model or refutation exists. 
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4 Evaluation 


We conduct experiments in order to evaluate the performance of the various 
algorithms implemented in COOL in comparison with each other, as well as 
in comparison with other tools (where applicable). Complete definitions of all 
formula series used in the evaluation as well as additional experimental results 
can be found in the full version [17]. 


Experiments: In a first experiment, we compare COOL 2 with the established 
reasoner FaCT++, which supports the description logic SROT Q(D) (subsuming 
fixpoint-free graded modal logic), using the following series of formulas from 
Snell et al. [43]. 


Cardinality(n) := (n — 1)ap A (n — 1)p A [n]7q A [n]q (Sat) 
CardinalityU(n) := (n — 1)np A (n — 1)p A [njn A [n — 1]q (UnSat) 


Intuitively, the satisfiable Cardinality(n) formulas express that there are at least 
2n successors and that both q and ~q are satisfied in at most n successors, 
each; similarly the unsatisfiable CardinalityU(n) formulas state that there are at 
least 2n successors, and that q and ~q hold in at most n and n — 1 successors, 
respectively; the latter statements imply that there are at most 2n — 1 successors, 
yielding a contradiction. 

Going beyond next-step formulas, we continue by devising various complex 
series of graded -calculus formulas that involve (nested) fixpoints and express 
non-trivial properties of graded trees, automata and games. 


— We obtain a series of unsatisfiable formulas by requiring the existence of an 
n + l-branching tree in which p holds everywhere while at the same time 
requiring that this tree contains some state with n + 2 successors that satisfy 
Pp: 


TreeU(n) = (vX. (n) (p A X) A [n + 1Jnp) A (HY. (n + 1p V (n)(pAY)) (UnSat) 


— Next we turn our attention to graded formulas involving parity conditions. 
We devise a series of valid formulas expressing that graded parity automata 
can be transformed to graded Büchi automata accepting a superlanguage of 
the original automaton: 


Parity ToBuechi(n, k) := Parity(n, k) — Buechi(n, k) (Valid) 


Here, Parity(n, k) encodes parity acceptance with k priorities and grade n 
while Buechi(n, k) expresses Büchi acceptance by a nondeterministic automa- 
ton that eventually guesses the maximal priority that occurs infinitely often; 
the negated formula —ParityToBuechi(n, k) is unsatisfiable. 


? Scripts and executables that allow for reproducing our experiments can be found at 
DOI 10.5281/zenodo.8042581. 
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— Rabin conditions are given by families of pairs (ij, f;);<x of sets i;, fj of states, 
and express the constraint that there is some j < k such that states from 7; 
(infinite) are visited infinitely often and states from f; (finite) are visited 
only finitely often. We can express Rabin conditions with k pairs (and one- 
step property w), Biichi properties and satisfaction of single Rabin-pairs by 
formulas Rabin(k, Y), Buechi( f,Y) and RabinPair(z, f,Y), respectively. Then 
we obtain valid formulas stating that the existence of an n+ 1-branching tree 
that satisfies the Rabin condition on each path implies that there is a path 
satisfying a simpler Büchi condition or a single Rabin-pair, respectively: 


RabinToBuechi(k, n) := Rabin(k, (n)) — Buechi(ii V ... V ix, (0)) (Valid) 
RabinToRPair(k, n) := Rabin(k, (n)) > Vy <5 <;, RabinPair (ij, fj, (0)) (Valid) 


— Coming to games, we specify the winning regions in graded Büchi and 
Rabin games by formulas BuechiG(f,n) and RabinG(k,n), respectively; in 
such graded games, players are required to have at least n winning moves at 
their nodes in order to win. The following valid formulas then express that 
winning strategies in graded Rabin games with k pairs guarantee that some 
node from 7; U... Ui, is visited infinitely often: 


RabinGame(k, n) := RabinG(k, n) — BuechiG(ii V... Vik n) (Valid) 


In a final experiment on alternating-time formulas, we compare COOL 2 
with TATL [6] on the ATL example formulas given in [6] as well as on additional 
formula series. For instance, we turn the formula ((1))GpA7((2)) F\(1)) Gp (written 
here using ATL syntax) from [6] into a series Nest(n) with increasing number of 
nested operators; formulas then alternatingly are satisfiable and unsatisfiable: 


xX(0)=p xi +1) = >(2)) F(I)GX(i)  Nest(n) = (1) Gp A x(n), 


Results: We conducted all experiments on a virtual machine with four 2, 3GHz 
vCPUs processors and 8GB of RAM. We compare with a 64-bit binary of 
FaCT++ v1.6.5 and with TATL. We compute all results with a timeout of 60 
seconds and average the results over multiple executions. For the execution and 
measurement we use hyperfine?. Below, ‘COOL’ and ‘COOL on-the-fly’ refer to 
invoking COOL 2 with solving rate once and adaptive, respectively. 

Results for the Cardinality and CardinalityU series are shown in Fig.1 and 
Fig. 2, respectively. From n = 10 and n = 8 onwards, COOL 2 outperforms 
FaCT++ considerably. An explanation for this could be that FaCT++ appears 
to treat multiplicities in a naive way while COOL 2 employs the more efficient 
one-step satisfiability algorithm. 

Results for the unsatisfiable tree property are shown in Fig.3. As these for- 
mulas contain fixpoint operators, a comparison with FaCT+-+ is not possible. 
While COOL 2 is generally capable of handling quite large branching factors, 
this experiment showcases the drawbacks of on-the-fly solving in the case that a 
formula cannot be decided early so that repeated attempts of solving the game 
early lead to overhead computations. 


3 https: //github.com/sharkdp/hyperfine. 
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Runtimes for COOL 2 (using on-the-fly solving) on the unsatisfiable series of 
parity formulas —ParityToBuechi(n, k) are shown in Fig. 4. The results indicate 
that increasing the number of priorities k has a much stronger effect on the 
runtime than increasing multiplicities n in the modalities. This is in accordance 
with expectations as increasing k leads to much larger determinized automata 
and resulting satisfiabilty games, while increasing n only complicates the modal 
steps in the game while leaving the global game structure unchanged. 

Results for the Rabin families of formulas are given in the table below, with f 
indicating a timeout of 60s. COOL 2 is able to handle reasonably large formulas 
describing Rabin properties of automata and games, with the series for n = 1 
expressing properties of standard automata (solved using tableau rules), and the 
series with n = 2 properties of graded automata with multiplicity 2 (solved using 
one-step satisfiability). 

In accordance with previous experiments on random ATL formulas of larger 
sizes in [23], COOL 2 generally outperforms TATL by a large margin, starting 
from formulas containing at least five modalities or involving nesting of temporal 
operators; this trend is confirmed by Fig.5 which shows the stepped execution 
times for the series Nest that alternates between being satisfiable and unsatisfi- 
able 
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series\¥ 1 2 3 
Rabin ToBuechi(k, 1)]0.03| 0.51 45.25 2 
Rabin ToBuechi(k, 2)[0.08/10.56| t a 
RabinToRPair(k, 1) ]0.03] 8.38] t E 
RabinToRPair(k,2) |0.07| t ii ! - 
RabinGame(k,1) [0.05] 1.04] + Oe ho a 8 aang depth S S H 
RabinGame(k, 2) 0.31 43.94 j —e@— COOL on-the-fly —E— COOL —a— TATL 


Fig. 5. Runtimes for the ATL series Nest(n) 


In summary, COOL 2 shows promising performance in comparison to TATL 
and FaCT++, as well as for practical applicability. On graded formulas without 
fixpoints, COOL 2 scales much better than FaCT++ with regard to increasing 
multiplicities. In the presence of fixpoints, COOL 2 still scales well and can han- 
dle multiplicities that should be sufficient for practical use. The formula series 
—Parity ToBuechi appears to show the limits of COOL 2 with the current imple- 
mentation of graded one-step satisfiability checking. Nonetheless, our results 
indicate that COOL 2 is capable of automatically proving or refuting involved 
properties of (graded) w-automata and games in reasonable time. 


5 Conclusion 


We have described and evaluated the current version COOL 2 of the COalgebraic 
Ontology Logic reasoner (COOL). Future development will include the imple- 
mentation of additional instance logics, such as the probabilistic and graded 
p-calculus with linear inequalities, as well as support for the full coalgebraic 
p-calculus via on-the-fly determinisation of unrestricted Buchi automata, using 
the Safra-Piterman construction. 
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Abstract. We present a generic tree-interpolation algorithm in the 
SMT context with quantifiers. The algorithm takes a proof of unsatisfi- 
ability using resolution and quantifier instantiation and computes inter- 
polants (which may contain quantifiers). Arbitrary SMT theories are 
supported, as long as each theory itself supports tree interpolation for 
its lemmas. In particular, we show this for the theory combination of 
equality with uninterpreted functions and linear arithmetic. The inter- 
polants can be tweaked by virtually assigning each literal in the proof 
to interpolation partitions (colouring the literals) in arbitrary ways. The 
algorithm is implemented in SMTInterpol. 


Keywords: Tree Interpolation - Quantified Formulas - SMT 


1 Introduction 


Craig interpolants [7] were originally proposed to reason about proof complex- 
ity. In the last two decades, research reignited when interpolants proved useful 
for software verification, in particular for generating invariants [15]. Tree inter- 
polants are useful for verifying programs with recursion [12], and for solving 
non-linear Horn-clause constraints [23], which can be used for thread modu- 
lar reasoning [10,16] and verifying array programs [20]. For many verification 
problems, reasoning about first-order quantified formulas is needed. Quantified 
formulas are, among others, needed to model unsupported theories or to express 
global properties of arrays [19], for example, sortedness [3,24]. 

An interpolation problem is an unsatisfiable conjunction of several input for- 
mulas, the partitions of the interpolation problem. An interpolant summarises 
the contribution of a single or multiple partitions to the unsatisfiability. Inter- 
polants can be computed from resolution proofs. However, most methods require 
localised proofs where each literal is associated with some input partition [22]. 
Proofs generated by SMT solvers, especially with quantifier instantiations, usu- 
ally contain mixed terms and literals created during the solving process that 
cannot be associated with a single input formula. 
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In this paper, we extend our work on proof tree preserving sequence interpo- 
lation of quantified formulas [13]. The method presented therein allows for the 
computation of inductive sequence interpolants from instantiation-based resolu- 
tion proofs of quantified formulas in the theory of uninterpreted functions. The 
key idea of this method is to perform a virtual modification of mixed terms intro- 
duced through quantifier instantiations, thus allowing to compute an inductive 
sequence of interpolants on a single, non-local proof tree. 

We extend the interpolation algorithm to compute tree interpolants and to 
support arbitrary SMT theories (with the single restriction that such a theory 
itself must support tree interpolation for its lemmas). We simplify the treat- 
ment of mixed terms by virtually flattening all literals independently of the 
partitioning. We show that the literals can be coloured (assigned to a partition) 
arbitrarily, and that for every colouring, correct interpolants are produced. The 
interpolants contain quantifiers for the flattening variables that bridge different 
partitions, and by choosing colours sensibly the number of quantifiers can be 
reduced. In contrast to previous works [1,12] which produce tree interpolants by 
repeated binary interpolation and require multiple proofs, our method computes 
a tree interpolant from a single proof. 


Related Work. Many practical algorithms to compute interpolants have been 
presented. We focus here on proof-based methods that either work in the presence 
of quantifiers, or that can compute tree interpolants, or both. 

Our work builds on the method presented in [4] for computing interpolants 
from instantiation-based proofs in SMT. It is based on purifying quantifier 
instantiations by introducing variables for terms not fitting the partition, and 
adding defining auxiliary equalities as a new input clause in the proof. Our 
method introduces these variables and equalities only virtually for computing 
the partial interpolants. Thus, tree interpolants can be computed from a single 
proof of unsatisfiability, while in [4] a purified proof is required for each partition. 

There exist several methods to compute interpolants for quantified formu- 
las inductively from superposition-based proofs. In [2], each literal is given a 
label (similar to our colouring) used to project the clause to the different parti- 
tions. First, a provisional interpolant is computed that may contain local sym- 
bols. These symbols are replaced by quantified variables to obtain an inter- 
polant. In contrast to our method, the approach only works when the provi- 
sional interpolants contain at most local constants, i.e., no local functions or 
predicates, and the assignment of labels is not flexible as our colouring. The 
method in [17] is based on a slightly modified proof, where substitution steps 
are done separately. First, a relational interpolant is computed, which may con- 
tain local function symbols, but only shared predicates. In logic without equality, 
or when the only local symbols are constants, the relational interpolant can be 
turned into an interpolant by quantifying over non-shared terms, respecting their 
dependencies. 

A very different method based on summarising subproofs is presented in [9]. 
The proof is split into subproofs belonging to a single partition. The relevant 
subproofs are summarised in an intermediant stating that their premises imply 
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their conclusion. If the subproofs contain only symbols of the respective partition, 
the resulting formula is an interpolant. If the proof can be split in that way, the 
method works for any theory and proof system, but for tree interpolation, a 
different proof would be required for each partitioning. 

Tree interpolants can be computed by repeated binary interpolation from 
formulas where the children interpolants are included, as discussed in [12]. In 
the propositional setting, [11] discusses under which conditions sets of inter- 
polants with certain relations, such as tree interpolants, can be obtained by 
binary interpolation on different partitionings of the same formula. The method 
is implemented in OpenSMT, but the solver, and therefore the interpolation 
engine, does not support quantifiers. 

A general framework for computing tree interpolants for ground formulas 
from a single proof has been presented in [5]. It works for combinations of 
equality-interpolating theories and is based on projecting mized literals using 
auxiliary variables and predicates. Additionally, the rule for computing a resol- 
vent’s interpolant from its antecedents’ interpolants is more involved. The 
method cannot deal with quantifier instantiations, nor with terms mixing sub- 
terms from different partitions. We discuss in Sect.6 how it can be combined 
with the interpolation method for quantified formulas presented in this paper. 

The first implementation of a tree interpolation algorithm in the presence of 
quantifiers and theories was in Vampire [1]. It is based on repeatedly computing 
binary interpolants for modified interpolation problems, similar to [12]. For each 
binary computation, the proof must be localised in order to be able to compute 
interpolants. In contrast, our method computes tree interpolants in one go from 
a single proof that has been obtained without knowledge of the partitioning of 
the tree interpolation problem. To the best of our knowledge, Vampire is the only 
other tool that is able to compute tree interpolants in the presence of quantifiers. 


2 Notation 


We assume that the reader is familiar with first-order logic. We define a theory 
T by its signature, that contains constant, function and predicate symbols, and 
its set of axioms, closed formulas that fix the meaning of those function and 
predicate symbols that are interpreted by the theory. 

A term is a variable or the application of an n-ary function symbol to n 
terms. An atom is the application of an n-ary predicate to n terms, and a literal 
is an atom or its negation. A clause is a disjunction of literals, and a formula 
is in conjunctive normal form (CNF) if it is a conjunction of clauses. We use T 
(resp. L) for the formula that is always true (resp. false). 

We will demonstrate our algorithm using the theory of equality, and the the- 
ory of linear arithmetic (with rationals and/or integers). The theory of equal- 
ity establishes reflexivity, symmetry, and transitivity of the equality predicate 
=, and congruence for each uninterpreted function symbol. For simplicity of 
the presentation, uninterpreted constants are considered as 0-ary functions, and 
uninterpreted predicate symbols as uninterpreted functions with Boolean return 
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value. The theory of linear arithmetic contains the predicates <, <, rational con- 
stants c, the binary addition function +, and a family of unary multiplication 
functions c, one for each rational constant c. These symbols have their usual 
semantics, and the main theory lemmas are trichotomy (x < y Vx =yVx >y) 
and a variant of Farkas lemma. For simplicity, we apply arithmetic conversions 
implicitly and treat x < y and y > a and 1-#+(-—1)-y <0 as the same literal, 
and zx > y as its negated literal. 

We denote constants by a,b,c, functions by f,g,h, variables by v, x,y,z, and 
terms by s,t. We use £ for literals, C for clauses, and ¢, F, J for formulas. 

For a term t, the outermost (or head) function symbol is denoted by hd(t). 
The set of all uninterpreted function symbols occurring in a formula F is symb(F’) 
and the set of all free variables in F is Free Vars(F’). The result of substituting in a 
formula F each occurrence of a variable x by a term tis denoted by F'{x + t}. By 
x and t, we denote the list of variables x1, ..., £n and terms t1,...tn, respectively. 
We use the symbol = to denote equivalence between formulas, and to assign a 
formula to a formula variable. 


3 Preliminaries 


Craig Interpolation. A binary Craig interpolant |T] for an unsatisfiable conjunc- 
tion A A B is a formula I that is implied by A, contradicts B, and contains 
only symbols that occur in both A and B. A generalisation are tree interpolants, 
which introduce several partitions in a tree-like structure. 


Definition 1 (Tree interpolation). A tree interpolation problem (V, E, F) 
is a labelled binary tree where V is a set of nodes connected by directed edges 
ECV x V pointing towards the root node. Every node except for the root node 
has one outgoing edge to its parent, and each non-leaf node has exactly two 
incoming edges. The partitions P C V of the tree interpolation problem are the 
leaf nodes. The labelling function F assigns a formula to each partition p € P of 
the tree such that their conjunction is unsatisfiable. We use st(v) C P to denote 
the set of leaves in the subtree of the node v, i.e., the set of leaves for which a 
path to the node v exists. 

A tree interpolant for the interpolation problem (V, E, F) is a labelling func- 
tion I for all nodes with the following properties: 


1. The label I(v,) of the root node vr of the tree is L. 

2. For each leaf node p € P, its interpolant I(p) is implied by the formula F(p). 
3. For each inner node v € V \ P, its interpolant I(v) is implied by the conjunc- 
tion I(v1) A I(v2) of the interpolants labelled to the two child nodes v1, v2. 

4. For each node v, the symbols in I(v) occur both inside and outside the subtree 


st(v), ie., symb(I(v)) E (Upe stt) 9Mb(F(P))) N (Upgstto) 89m0(F(p)))- 
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Remarks. In contrast to the earlier definition of tree interpolation [1,5], only 
the leaves of the tree are labelled by F here. A tree interpolation problem with 
labelled inner nodes can be transformed to our formalism by adding a leaf child 
to each such node. A non-binary tree can be extended to a binary tree by adding 
more internal nodes. If the interpolants of the newly created nodes are ignored, 
the remaining interpolants are tree interpolants according to the earlier definition 
for tree interpolation. 

A binary interpolant of A and B corresponds to the tree interpolant of the 
tree containing just two leaves A and B, more precisely, it is the interpolant 
labelled to the first leaf. Vice versa, each interpolant I(v) of a tree interpolant 
is also a binary interpolant of the formulas in the partitions A := st(v) and 
A° := P \ st(v). Since the set A defines v uniquely, we can also use I4 to denote 
I(v). We call a symbol A-local if it only occurs in partitions in A, A‘-local if it 
only occurs in partitions in A‘, and shared if it occurs in both. The interpolant 
may only contain shared symbols. 


Theory Combination. We assume that the solver uses Nelson—Oppen style the- 
ory combination sharing equalities without explicitly introducing auxiliary vari- 
ables, and that each lemma in the proof belongs to one theory. Subterms in these 
lemmas containing symbols from a different theory are treated as if they were 
auxiliary variables. We further assume that there is a theory-specific interpola- 
tion procedure for the lemmas. In this paper, we do not have the assumption 
that theories are equality-interpolating. We introduce quantifiers in the inter- 
polants for such theories. However, our approach can also be combined with 
equality-interpolating theories and corresponding procedures to avoid quanti- 
fiers, see Sect. 6. 


CNF Transformation and Quantifiers. We assume that complex input formulas 
are transformed to CNF by Tseitin-encoding, which introduces Boolean proxy 
atoms. Existentially quantified variables are replaced with Skolem constants or 
functions (if nested under a universal quantifier) and conjunctions are lifted 
over universal quantifiers. Complex subformulas under a universal quantifier 
are replaced by uninterpreted predicates, taking as arguments the quantified 
variables. Quantified Tseitin-style axioms give the meaning for these predicates. 
Thus, we end up with quantified clauses of the form VZ. ¢;(Z) V --- V a(z), 
which we treat as a proxy literal. Instances of quantified clauses are created 
using instantiation lemmas of the form 7(VZ. 41 (Z) V -+ V €n(%)), AŒ), ---, lnt) 
where t are ground terms. Note that the proxy atom for a quantified formula 
occurs only positively in input clauses and negated in instantiation lemmas. We 
note that all preprocessing steps are done locally for each input formula, and 
that auxiliary predicates and Skolem functions are fresh predicate or function 
symbols. An interpolant of the preprocessed formulas is also an interpolant of the 
original formulas, because the auxiliary symbols are not shared between different 
input formulas and will never appear in the interpolant. 
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Proofs. A resolution proof for the unsatisfiability of a formula in CNF is a 
derivation of the empty clause L using the resolution rule 


Ci ve C2V a£ 
Ci V C2 


where Cı and Co are clauses, and £ is a literal called the pivot (literal). A 
resolution proof can be represented by a tree, or more generally, if the same 
subproof is used more than once, by a directed acyclic graph (DAG). In our 
setting, the DAG has three types of leaves: input clauses, theory lemmas, i.e., 
clauses that are valid in the theory 7, and instantiation lemmas of the form 
~—(Y7.¢(7)) V ọ(t). The inner nodes are clauses obtained by resolution, and the 
unique root node is the empty clause L. 

Binary interpolants can be computed from a resolution proof by computing 
so-called partial interpolants for each clause. Each proof step proves a clause C as 
a consequence of the input AA B, hence it proves that AA B/A—-C is unsatisfiable. 
If each literal in the proof is assigned to, or coloured with, either partition A 
or B, a partial interpolant for each intermediate step is the interpolant of A A 
~C | Aand BA-C | B, where the projection =C | A extracts from the 
conjunction ~C all literals that are coloured with partition A. McMillan showed 
for propositional logic that partial interpolants (cf. Definition 2 in [18]) can be 
computed recursively for each resolution step as the disjunction of the partial 
interpolants of the antecedents if the pivot is coloured as A, and their conjunction 
if it is coloured as B. 


4 Colouring of Terms and Literals 


In this section, we fix an interpolation problem (V, Æ, F), with partitions P C V. 
We use the following example to illustrate our interpolation algorithm. 


Example 1 (Running example). Take the tree interpolation problem with nodes 
V = {123, 1,23, 2,3} and edges EF = {(1, 123), (23, 123), (2, 23), (3, 23)} (see also 
Fig. 1), where the partitions P = {1,2,3} are labelled with F'(p) = p where 


gi =Vea. g(h(x)) <x, d2=Vy. gly) 2b, $3 = Vz. f(g(2)) # f). 


The conjunction of the three formulas is unsatisfiable. Instantiating ¢1 with b 
gives g(h(b)) < b. Instantiating ¢2 with h(b) gives g(h(b)) > b. Together they 
imply g(h(b)) = b. However, this contradicts ¢3 instantiated with h(b). This proof 
creates, among others, the new literal g(h(b)) < b. The term g(h(b)) contains 
function symbols that do not occur in a common partition. 


We recall that by symb(F'(p)), we denote the uninterpreted function symbols 
occurring in the formula F(p). We also keep track of the partitions where a 
symbol occurs: 


Definition 2 (Partitions). The partitions of a function symbol f are the par- 
titions where this symbol occurs: 


partitions(f) = {p € P | f € symb(F(p))}. 
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McMillan’s interpolation algorithm assumes that all symbols of a literal occur 
in one partition, such that the literal can be coloured with that partition. This 
is no longer the case in SMT, because new literals are created during the proof 
search, especially in the presence of instantiation lemmas. Our solution to this 
problem is to split each literal into many smaller literals and assign each of them 
to a partition. To keep the presentation simple, we flatten all (non-proxy) literals 
using a fresh variable for each application term. Thus, for every term t occurring 
in the resolution proof, we create a fresh variable v; and associate with it a set of 
flattening equalities. In each literal, the top-level terms are replaced with their 
associated variable, and the defining equalities are conjoined. 


Definition 3 (Flattening). For a term t, we introduce a fresh variable vi, and 
similarly for all its subterms. The associated set of flattening equalities FlatEQ(t) 
is defined as follows: 


FlatE Q(t) = {v¢(e,,...tn) = F(t) -++5Vtn) | fti,- -- tn) is a subterm of t}. 


The flattened version of a literal £ is 


Ut, = Uta if l=t, = te 
Cr Ue Hees +en UY, Se Pl Satyr +--+ a <ec 


flatten(£) = 


and the associated set of flattening equalities is as follows 


Faiko® = ene U FlatEQ(t2) ifl=t = t 

FlatEQ(ti) U --- U FlatEQ(tn) if@=a-tit-:-+ten-ty <e. 
The flattened version of a negated literal is the negation of the flattened literal, 
i.e., flatten(7l) = —flatten(). The set of flattening equalities for a negated lit- 
eral is the set of flattening equalities for the literal itself, i.e., FlatEQ(70) = 
FlatEQ(é). 


The conjunction of the equalities in FlatEQ(t) implies that v; = t. Similarly, the 
conjunction flatten(£) A N FlatEQ(£) implies the literal Z and is equisatisfiable to 
£. Proxy literals like quantified formulas are not flattened, as they will never occur 
in a partial interpolant. For such a proxy literal, flatten(Vx.d(x)) = Vx.¢(x) and 
FlatEQ(Vx.d()) = 0. 


Example 2 (Flattening). Consider the literal g(h(b)) < b. Its flattened version 
is flatten(g(h(b)) < b) = vg(n(oy) < vo, and the set of flattening equalities is 
FlatEQ(g(h(b)) < b) = FlatEQ(g(h(b))) U FlatEQ(b) 
= {Ug(h(b)) = g(Un(b)) Un(b) = h(vp), vo = b}. 


To define partial interpolants, we colour each atom £ with some partition, 
denoted by colour(£) € P. The negated atom always has the same colour. For 
proxy atoms created during the CNF conversion, it is important to colour them 
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with the input partition from which they were created. The colour of other 
literals can be chosen arbitrarily, but a good heuristic would choose a partition 
where most of the outermost function symbols occur. Each flattening equality is 
associated with all partitions where the corresponding function symbol occurs. 
The projection of auxiliary equations on a partition p, denoted by FlatEQ(é) | p 
is defined as the conjunction of the equalities (V f(t, tn) = f (Utis ---Utn)) € 
FlatEQ(£) where p € partitions(f). 

Finally, we define the projection of a literal Z to a partition p. The projection 
kernel £ |~ pis flatten(£) if p = colour(£) or T otherwise. The projection of £ to 
p is defined as £ | p = £ |7 pA FlatEQ(é) | p. We define the projection to a set 
of partitions £ | A with A C P (and similarly £ |~ A) as the conjunction of all 
projections £ | p with p € A. For a conjunction of literals F = ¢; A +-+- A ln, we 
define F |p= 4 | pA: A%n | p and similar for F | A, F |~ pand F |~ A. 


Example 3 (Projection of literals). Consider again the literal g(h(b)) < b from 
our running example (Example 1), and assume that we arbitrarily assign it to 
partition 2, i.e., colour(g(h(b)) < b) = 2. We have partitions(g) = {1,2,3}, 
partitions(h) = {1} and partitions (b) = {2,3}. The projections are hence: 


)= 
g(h(b)) < b | 1 = vgn) = 9(Un@y) A vao) = (ve) 
g(h(b)) < b | 2 = vgn) S vo A Vaho) = IVre) A Vo = b 
g(h(b)) < b | 3 = vaino) = g(Vhæ)) A Vs = b 


Similar to the last paragraph in Sect. 3, we define a partial interpolant of 
a clause C as an interpolant of the input problem and ~C. More precisely, it 
is the tree interpolant of a slightly modified tree interpolation problem, where 
the projection =C | p is added to each leaf node p € P. Since this step adds 
flattening variables potentially shared between several partitions, these variables 
can occur in the interpolants. The following definition accounts for the variables 
occurring in the projection of a clause. 


Definition 4 (Supported variable). We call a variable vi supported by a 
clause C if its corresponding term t is a subterm of a non-prozy literal £ in C. 


The partial tree interpolant of a clause C may then contain a variable v; as 
long as it is supported by the clause C. 


Definition 5 (Partial tree interpolant). A partial tree interpolant for a 
clause C is a tree interpolant as defined in Definition 1 for the tree interpolation 
problem (V,E, F') where the leaves are labelled with F'(p) = F(p) ^-C | p 
For the symbol condition, all variables supported by the clause may occur in all 
partial interpolants. 


5 Interpolation for Quantified Formulas 


In the following, we describe how to compute tree interpolants for instantiation- 
based resolution proofs. We assume that each literal has been assigned to exactly 
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one partition of the tree interpolation problem, as described in the previous 
section. Following McMillan’s algorithm, we compute partial tree interpolants 
inductively over the proof tree. The leaves of the proof tree are theory lem- 
mas, for which we use theory-specific interpolation procedures, or they are input 
clauses or instantiation lemmas, for which we compute partial tree interpolants 
as described below. The inner nodes are obtained by resolution steps, for which 
we follow McMillan’s algorithm to combine interpolants, and additionally treat 
variables that violate the symbol condition, as described later in this section. 


5.1 Interpolation Algorithm 


We start by explaining how the interpolants for leaf nodes are computed. Our 
algorithm computes interpolants separately for each node v € V in the tree 
interpolation problem. As mentioned in the preliminaries, we set A = st(v) and 
use I4 to denote the interpolant I(v). 


Input Clauses. We assume that each input clause occurs in exactly one partition. 
The partial tree interpolant for an input clause C from partition p is given by 
Ia =-7(7C | A‘) if pE A, and I4 =-C | Aifp¢ A. 

Note that the literals can be assigned to a different partition than the clause. 
Although it makes sense to assign a literal to the same partition as the input 
clause it occurs in, this is not possible when the literal occurs in several input 
clauses. Therefore, the above formulas are not necessarily T or L. Proxy literals 
always have the same colour as the input clause and will therefore never appear 
in the interpolant. 


Instantiation Lemmas. The partial tree interpolant for an instantiation lemma C' 
obtained from a quantified input clause Vz.¢(x) from partition colour (Yx.o(x)) 
is computed in the same way as for input clauses. 


Theory Lemmas. We only require that for each theory one can compute a partial 
tree interpolant for its lemmas, or to be more precise, the flattened negated 
lemmas. Thus, we can reuse any existing procedure. For self-containment, we 
cover transitivity, congruence, trichotomy and Farkas lemmas, which are the kind 
of lemmas our solver produces for the theory of equality and linear arithmetic.! 

For a transitivity lemma with the corresponding conflict =C = tı = tz A 
+++ A tn—1 = tn At, Æ tn we can ignore the auxiliary equations introduced by 
flattening the terms, as the projection kernel is also a transitivity lemma. A 
partial tree interpolant is computed by summarising for each A the chains of the 
flattened equalities (and, if applicable, the single disequality) that are assigned 
to a partition p € A. More precisely, let i1 < --- < im be the boundary indices 
such that colour(t;,-1 = ti,) € A and colour(t;, = ti,41) E A or vice versa. Set 
i1 = 1 if ti At, and tı = tg are in different partitions and im = N if tn—-1 = tn 


' Branches in linear integer arithmetic [8] are decisions on inequality literals and are 
handled by our resolution rule. 
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and tı Æ tn are in different partitions. If m = 0, then all colours of the equalities 
are in A and the interpolant is L, or they are all in A° and the interpolant is 
T. Otherwise, the interpolant summarises the equalities between the boundary 
indices that have a colour in A: if colour(t; = tn) ¢ A, then the interpolant 
is Ia = Ui, = Vin A Vig = Vig A*A Vin) = Vin, Otherwise the interpolant is 
IA = Vin = Vig Nett AN Vig 9 = Vim- \ Vim Æ Vi. Here, v; denotes the auxiliary 
variable introduced for ¢;. 

The flattened version of the conflict corresponding to a congruence lemma 
C= f(ti,..-,tn) = f(si,---, Sn) V ti Æ $1 V Vita F Sn is 


UF (tryentn) Æ UF (8150480) A Vtg = Us, A... A Vtg = Usp 
A Uf (ti,...,fn) = fva, as Vt, ) A Uf (s1,.--;5n) = f(vs:, a i Usa) 
A N{E | £ € FlatEQ(t),t € {t1,...,tn)51,---)5n}}- 


Note that the formula is still a congruence conflict if we drop the last line. 
Consequently, the flattening equalities for the arguments of the f-applications, 
and for their subterms, are not needed in the computation of a partial inter- 
polant, they only establish the implication between the flattened and the orig- 
inal lemma. To obtain a partial tree interpolant, we first choose an arbitrary 
partition pr € partitions( f). The partial tree interpolant is computed as follows. 

alaG |7 A°) itpee A 

aC |7 A otherwise 

For a trichotomy lemma C = tı = t2 Vt, > t2 V tı < t2, both Ig = =C |7 A 
and I4 = =(-C |~ A®) are partial interpolants. We can always choose the 
projection that contains at most one literal. 

A Farkas lemma has the form C = 7(s1 < b1) V-V a(S, < bn) where si 
is of the form c;1 - v1 + . . - + Cim * Um and b;, cij are numeric (integer) constants. 
It is a valid lemma if there are Farkas coefficients (numeric integer constants) 
Big esas Rn > 0 with ey as ki: si = 0 and pe ki- bi < 0. We assume that 
the lemma is flattened and all v; are variables. The flattening equalities can be 
omitted from the lemma without changing its validity. For a set of partitions A, 
we denote by La := {i | colour(s; < bi) € A} the indices where s; < b; is A- 
local. The partial tree interpolant for a Farkas lemma is computed by summing 
up the A-local literals multiplied by their Farkas coefficients. We obtain I4 = 
(Micha ki- si) < (ers ki - bi). Variables whose coefficients sum to zero are 


removed from the inequality. If A contains all inequalities, they sum up to the 
conflict 0 < X; ; ki- bi and we set I4 = L. 


Theorem 1. The interpolants as defined in this section are valid partial tree 
interpolants for the respective leaf nodes. 


The proof for this theorem is a straight-forward case distinction over the type 
of leaf node. Details can be found in [14]. 
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Resolution Steps. In a resolution step, we obtain the partial interpolant of the 
resolvent using the partial interpolants of the premises. 


Crve: T} CV: TÌ 
CVC: 13 


As the first step, we follow McMillan’s algorithm and combine the interpolants 
of the premises either with V or with A depending on whether the pivot literal 
is A or A‘-local. For tree interpolants, this is done separately for each node of 
the tree interpolation problem, and a literal is seen as A-local if its colour is one 
of the leaves in the subtree of the node. 


3 _ JIAV IA ifcolour(£)€ A 
~ |A if colour(@) ¢ A 


The formula I} computed above may still contain variables supported by the 
antecedents that are no longer supported by the resolvent C1 V C2. Each of those 
unsupported variables must either be replaced by its definition or bound by a 
quantifier in the partial tree interpolant. More precisely, let v; be an unsupported 
variable such that t is not a subterm of t with vy € Free Vars(I3). This variable 
must always exist, as there is always an outermost unsupported variable. Let 
t= f(ti,...,tn). We replace [3 as follows: 


Ar. [i {vi > x} if f is A-local, i.e., partitions(f) 
Ty = < Va. IB {v 2} if f is A°-local, i.e., partitions(f) 
I3 {ve f(vi,---,Ut,)} if f is shared (otherwise). 


CA, 
NA=9, 


We do this repeatedly for all variables in Free Vars(I3) that are unsupported. 
The variables may be treated in any order that respects the partial order induced 
by the subterm relation as described above. However, all interpolants of the tree 
interpolant must use the same order. 


Theorem 2. If I} is a partial tree interpolant of Cı V £L and IŻ is a partial 
tree interpolant of Ca V 70, then I as computed above, after the removal of 
unsupported variables, is a partial tree interpolant of Cy V C2. 


The proof for this theorem is given in [14]. 


Example 4 (Resolution). Take the running example and suppose £ = g(h(b)) = b 
is the pivot, Thy = Vq(n(by) < Vo and Thy = T. The interpolants are combined 
as Ty ^ I) since colour(¢) ¢ {1}. This results in the interpolant vg(h(b)) < vo- 
After the resolution step, we assume that Vg(h(b)); Uh(b); Vb are no longer sup- 
ported. The outermost variable is v,g(;(»)), which must be replaced by its def- 
inition: g(vp_(»)) < ve. Now vj) is bound by a quantifier, and since h only 
occurs in partition 1, an existential quantifier is used: Jy. g(y) < vp. In the final 
step, vp is bound by a universal quantifier since b does not occur in 1, yielding 
Vu.dy. gly) < a. 
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Note that the order of eliminating variables is important. If vy had been cho- 
sen in the first step despite occurring in FlatEQ(g(h(b))), the resulting formula 
would have been Jy.Vz.g(y) < x. This formula is not logically equivalent and is 
indeed not a valid interpolant, as it does not follow from Vx.g(h(x)) < x. 


Tg1,2,3} : dli 
PS 
we Ty2,33 : da.Vy.g(y) > x 
= SN 
1 : Va.g(h(x)) < $2 : Vy.g(y) > b $3 : Vz. f (g(z)) # f(b) 
Itiz : Væ.Jy.g(y) < Tto} : Vy.g(y) = b Its} : Vy-g(y) # b 


Fig. 1. Tree interpolation problem from Example 1 annotated by tree interpolants. 


CEDE ae x), g(h(b)) < b) 

(a(h(b)) <b)” (g(h(b)) = b, =(g(h(b)) < b), a(g(h(0)) = b)) 
ap a 
(g(h(b)) > 6)) Wyo) Zb) (Alvy-9(y) = b), g(h(b)) > b) 


l a 


g(h(b)) > b 
(ath) = Y ODAO) ) GW FD) FIO), FahO)) F FO) 
Fane O OAT 
a(hib)) ZB 


Fig. 2. Resolution proof for Example 1 with [input clauses], [instantiation lemmas}, 


[theory lemmas], and |resolvents |. 


Theorem 3. The algorithm in this section computes valid tree interpolants from 
a proof of unsatisfiability. 


Proof. By induction, every node in the proof tree is labelled by a valid partial tree 
interpolant: Theorem 1 is the base case and Theorem 2 the inductive step. The 
proof of unsatisfiability ends with the empty clause and its partial interpolant 
is a tree interpolant for the original problem. 


5.2 Full Interpolation Example 


We illustrate the algorithm on our running example (Example 1). Consider the 
tree interpolation problem given in Fig.1. The symbol b occurs in partitions 2 
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and 3, f in 3, gin 1, 2, and 8, and h in 1. Our goal is to compute tree interpolants 
Iti}, L¢2}, and Iys} for the leaf nodes such that ¢, implies [41}, 62 implies I¢9}, 
and $3 implies J;3}, and tree interpolant Iz 3} such that I{2,3} is implied by 
Ip} A Its}, and Igay A I¢2\3} implies L. 

Figure 2 shows an instantiation-based resolution proof for the unsatisfiability 
of 6; \¢2/A ¢3. First, we assign each literal occurring in the proof tree to exactly 
one partition. We colour each proxy literal for a quantified formula by a partition 
in which it occurs, e.g., colour(Vx.g(h(x)) < x) = 1. For the other literals, we 
can choose arbitrary colours. We assign the literals g(h(b)) = b, g(h(b)) < b, and 
g(h(b)) > b to partition 2, and the literal f(g(h(b))) # f(b) to partition 3. We 
then compute for each literal £ the projection onto each partition, i.e., £ | pi. For 
L = g(h(b)) < b assigned to partition 2, the projections are given in Example 3. 
As g(h(b)) > b and g(h(b)) = b are assigned to the same partition as £ and only 
differ in the comparison operator, their projections only differ in the comparison 
operator of the flattened version of the original literal. For the remaining literal 
f(g(h(b))) = f(b), we get the following projections: 


F(g(h(b))) = F(b) | 1 = vg (ny) = 9(Uncoy) A vro = hlv) 

F(g(A(b))) = f(b) | 2 = verry) = gvr) ) A ve = b 

F(g(h(6))) = F(b) 13 = vegan) = Yeo A Ufan) = Fae) A 
( 


Vg(n(b)) = I(r) A Ufo) = f(v) A vy = b 


We now compute partial tree interpolants for each node in the proof tree. 
The first input clause C = ¢, on the top left of the proof tree is from partition 1. 
The partial interpolants [;;, and I;1,9,3} are set to =(=C |~ A®°) = L, and Igo}, 
I;3}, and Ira 3} are set to =C |~ A = T. For the input clauses ¢2 and ¢3, the 
interpolants are computed analogously. To summarise: 


me ft a a a ZL 
j T T T H T T il dle 


We now compute the partial tree interpolants for the instantiation lemma on 
the top right of the proof tree. Similar as for the input clauses, we set Iyı} to 
A(-C |7 A®), i.e., to a(4C |~ 2) AA(-C |7 3) = vg(n(oy) < vo. Analogously, we 
compute all other partial tree interpolants for the three instantiation lemmas: 


iL. 
~o: V g(h(b)) <b: a 
1 ( (b)) ae VARD) > vp 


Ug(h(b)) S Vb Ug(h(b)) > Vb 


— 


: 
VAD Zb: AS VODO: A> 


7O N 
T ili T T T ml 


For the trichotomy lemma, the partial tree interpolants can be set to aC |~ A 
or —(3C |~ A‘). Due to our colouring, all literals in the lemma are either in A 
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or in A°. To get the most simple partial interpolants, we set Iyı} and Ita} to 
aC | A=T, and Ig} and Iyg.3} to (4C |~ A®) = L: 

h = 7a << a pa : SSG 
g(h(b)) = b V =(g(h(b)) < b) V =(g(h(b)) > b) ra if 
ZTN 

a lt “Tp 

For the congruence lemma, we have pf = 3. The partial tree interpolants 
Ij1; and Ig} are set to nC |~ Aas pf ¢ A for these partitions. We get Iyı} = T 
(neither of the flattened literals in ~C is contained in the projection kernel) and 
Ito} = Vg(n(by) = Vb, Since we chose 2 as the colour of this literal. Similarly, [;3} 
and Ira 3} are set to =(4C |~ A°). We get Ira} = vg(n(oy) A v and Iya 3} = L: 


a(h(b)) # BV F(g(h(b))) = F) : Pees 


pa 
Tt Ug(h(b)) = Ub Ug(n(b)) F Vb 


Having computed the partial tree interpolants for all leaves in the proof 
tree, we now compute the partial tree interpolants for each resolvent. If the 
colour of the pivot literal £ is in the A-part, i.e., colour(£) € A, the partial tree 
interpolant of the resolvent is the disjunction of the partial tree interpolants 
of its antecedents. Otherwise, if colour(¢) € A‘, we build the conjunction of 
the partial tree interpolants of its antecedents. In the resolution step for the 
resolvent clause C3 = g(h(b)) < b, the pivot literal is assigned to partition 1, 
i.e., colour(Wx.g(h(x)) < x) = 1. To obtain Tțı}, we hence build the disjunction 
of the partial interpolants of the antecedents Cı = Vu.g(h(x)) < z and Cp = 
~(Va.g(h(z)) < £) V g(h(b)) < b, so we get Tg} = Ify V If, = vgn) S v- 
Similarly, we obtain {2}, {3} and I¢2,.3} by conjoining the respective partial 
interpolants. Since the top-left interpolant is only T or L and the colouring 
of the pivot literal ensures that we either build the conjunction with T or the 
disjunction with L, the resulting tree interpolant of the resolvent is the same as 
for the top-right clause. The variables v,(p(»)) and vp are both supported by C3 
and thus allowed to appear in the partial interpolant. The resolution steps of 
the other inner nodes are very similar in that their partial interpolants always 
equal the partial interpolant of one of their antecedents. To summarise: 

g(h(b)) <b: 


h(b)) = b V 7=(g(h(b)) > b) : ta 
San a (g(h(b)) > b) na oe 
Ug(n(b)) S Vb Ug(n(b)) > Vb ~+ 
WMS A SOODE A 
T L T T ~ is 
g(h(b)) # b: “Ssh 


Lue? 
T Ug(h(b)) = Ub Ug(n(b)) É Vd 
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The last resolution step is a bit more involved. We have already computed 
the tree interpolant for partition 1 in Example 4 as Iņ} = Va.dy.g(y) < z. 
For partition 2, the disjunction Vg(h(b)) > Ve V Vg(n(b)) = Vb Can be simplified to 
Ug(n(b)) = Vo. The outermost variable vg(~(p)) is then replaced by g(vpip)), since 
g occurs in 1 and 2. Then for va) a universal quantifier is introduced, since h 
only occurs in partition 1, resulting in Vy.g(y) > vp. Finally, v is replaced by 
b, since it occurs in both 2 and 3. This results in Io; = Vy.g(y) => b. We omit 
the computation of the partial interpolant for partitions 3 and the node 23. The 
partial tree interpolant computed in this step is the tree interpolant of the full 
interpolation problem: 


eles `~ 
Pa Jx.Vy.gly) >x 


Vz.Jy.gly) <s Vy.g(y) >b Wy.g(y) £b 


6 Combination with Equality-Interpolating Theories 


In Sects. 4 and 5, we assign each literal to exactly one partition, such that we can 
apply McMillan’s algorithm to combine partial interpolants of the antecedents 
to obtain a partial interpolant for the resolvent. In the presence of equality- 
interpolating theories [25], we can also allow for mized literals where only outer- 
most terms must be assigned to one partition. More precisely, we can allow for 
equalities t; = t2 where the left-hand side tı is in one partition and the right- 
hand side tz in another, or linear constraints of the form c1 -t1 +.. -+ Cn * tn © Co 
with constants c; and © € {=,<,<,>,>}, where each t; is assigned to one 
partition. Such literals can be treated by applying proof tree preserving tree 
interpolation [5]. 

A mixed literal l = tı = te is coloured with two colours pı and po, so 
that each colour can be chosen to contain the outermost symbols of tı and to, 
respectively. The projections are £ |~ pı = va = ve, L | po = ve = ve, and 
for the negated literal ~£ |7 pı = EQ (ve, v4) and 70 |7 po = EQs(ve, vi), 
where ve is a fresh variable and £Q,, HQ, are shared uninterpreted predicates 
with Vz, y.7(£Q,(2,y) A EQ.(x,y)), that are only used for the interpolation 
algorithm. The partial interpolants for a lemma containing mixed literals will 
contain the auxiliary variable ve. If a negated mixed equality occurs in the con- 
flict (the negated lemma), we further require that vg occurs only in literals of the 
form EQ,(ve,s) for some shared term s. Valid interpolants will naturally have 
this shape, as the interpolated conflict also contains vg only as first parameter of 
an EQ;. We then introduce a new combination rule in the first part of interpo- 
lating resolution steps: For a mixed literal £, the two interpolants T![EQ; (ve, s)] 
and I?(v,) are combined to I'[I?(s)], i.e., interpolant I?(s) replaces the EQ- 
literals occurring in the interpolant J! to form the resolvent interpolant. This 
eliminates the variable vg without introducing a quantifier. The remaining part 
is unchanged, i.e., we still introduce quantifiers for unsupported flattening vari- 
ables. A proof that the first step produces a valid resolvent interpolant can be 
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found in [5]. This method produces quantifier-free interpolants if the input for- 
mulas were quantifier-free. An example for this method can be found in [13]. 


7 Implementation in SMTInterpol 


We implemented the algorithm in SMTInterpol? [6] with a few alterations. First, 
we used the combination with equality-interpolating theories described in the 
previous section. Second, we do not apply flattening explicitly. Instead of using 
an auxiliary variable, the interpolation algorithms for the lemmas include the 
corresponding term directly. This may result in an interpolant where the inter- 
polant has symbols that are not allowed, because the auxiliary variable was 
shared but its corresponding function symbol is local to one partition. Only 
in that case, we introduce the fresh variables for these subterms and replace 
the offending subterm in the interpolant with its variable. This creates the same 
interpolants as our presented algorithm, because the latter replaces each variable 
that stands for a shared function symbol by its definition in the end. 

SMTInterpol also supports literals that are shared. If this is done naively, 
the computed interpolants may violate the tree inductivity property (third prop- 
erty in Definition 1). We solve this by treating each literal as occurring in one 
designated partition when interpolating a lemma (minimizing the number of 
alternating chains in transitivity lemmas). We then apply Pudldk’s resolution 
rule [21] that has a case for shared literals. Our implementation colours input 
literals with all partitions it occurs in. For new terms created in the proof, the 
colour that matches the most outermost function symbols is chosen. If the term 
uses only symbols from one partition, then it is coloured with that partition. 
Equalities and inequalities between terms of different partitions are handled 
with the equality-interpolating procedure to avoid introducing quantifiers when 
it is not necessary. 


8 Conclusion 


We presented a tree interpolation algorithm for SMT formulas with quantifiers. 
The key idea is to virtually flatten each conflict corresponding to a clause in the 
resolution proof, such that the literals in the flattened version are non-mixed 
and can be assigned to the different partitions. The colouring of the original 
literals can even be chosen arbitrarily. Depending on the assigned colours, partial 
interpolants may contain flattening variables that bridge different partitions, 
which eventually must be bound by quantifiers. 

Our algorithm computes tree interpolants from a single, non-local proof of 
unsatisfiability obtained independently of the partitioning of the interpolation 
problem. It supports quantifiers and arbitrary SMT theories, given that the 


? Official webpage: https: //ultimate.informatik.uni-freiburg.de/smtinterpol/ 
Code available under LGPLv3 at https://github.com/ultimate-pa/smtinterpol. 
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theory itself supports tree interpolation for its lemmas, and we provided the 
algorithms for the theory of equality and the theory of linear rational arithmetic. 

Correctness proofs for our algorithm are available in [14]. The algorithm is 
implemented in the open-source SMT solver SMTInterpol. 
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Abstract. There are many techniques and tools to prove termination 
of C programs, but up to now these tools were not very powerful for fully 
automated termination proofs of programs whose termination depends 
on recursive data structures like lists. We present the first approach that 
extends powerful techniques for termination analysis of C programs (with 
memory allocation and explicit pointer arithmetic) to lists. 


1 Introduction 


In [11,16,17,25], we introduced an approach for automatic termination analysis 
of C that also handles programs whose termination relies on the relation between 
allocated memory addresses and the data stored at such addresses. This approach 
is implemented in our tool AProVE [14]. Instead of analyzing C directly, AProVE 
compiles the program to LLVM code using Clang [9]. Then it constructs a (finite) 
symbolic execution graph (SEG) such that every program run corresponds to a 
path through the SEG. AProVE proves memory safety during the construction 
of the SEG to ensure absence of undefined behavior (which would also allow 
non-termination). Afterwards, the SEG is transformed into an integer transition 
system (ITS) such that all paths through the SEG (and hence, the C program) 
are terminating if the ITS is terminating. To analyze termination of the ITS, 
AProVE applies standard techniques and struct 1ist { 

calls the tools T2 [7] and LoAT [12,13] unsigned int value; 

to detect non-termination of ITSs. How- **"** **8** next; 3 

ever, like other termination tools for C, up int mainO { 

to now AProVE supported dynamic data sA Dia ea enen 


unsigned int n = nondet_uint(); 


structures only in a very restricted way. // initialize list of length n 
. A struct list* tail = NULL; 
In this paper, we introduce a novel tech- struct list» curr; 


nique to analyze C programs on lists. In the for (unsigned int k = 0; k < n; k++) { 


. h ih ; 7 curr = malloc(sizeof(struct list)); 
program on the right, nondet_uint returns curr->value = nondet_uint(); 


a random unsigned integer. The for loop curr->next = tail; 
x $ tail = curr; } 
creates a list of n random numbers if n> 0. 


// traverse list 


The while loop traverses this list via poin- struct list» ptr = tail; 
i Pa Z : ee while(ptr != NULL) { 
ter arithmetic: Starting with tail, it com- ptr = *((struct list**) ((void*)ptr + 


putes the address of the next field of the offsetof (struct list, next)));}} 
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current element by adding the offset of the next field within a list to the address 
of the current list and dereferencing the computed address (i.e., the content of the 
next field). This is done by offsetof, defined in the C library stddef .h.' Since 
the list is acyclic and the next pointer of its last element is the null pointer, list 
traversal always terminates. Of course, the while loop could also traverse the list 
via ptr = ptr->next, but in C, memory accesses can be combined with pointer 
arithmetic. This example contains both the access via curr->next (when initial- 
izing the list) and pointer arithmetic (when traversing the list). 

We present a new general technique to infer list invariants via symbolic 
execution, which express all properties that are crucial for memory safety and 
termination. In our example, the list invariant contains the information that 
dereferencing the next pointer in the while loop is safe and that one finally 
reaches the null pointer. In general, our novel list invariants allow us to abstract 
from detailed information about lists (e.g., about their intermediate elements) 
such that abstract states with “similar” lists can be merged and generalized 
during the symbolic execution in order to obtain finite SEGs. At the same time, 
list invariants express enough information about the lists (e.g., their length, their 
start address, etc.) such that memory safety and termination can still be proved. 

We define the abstract states used for symbolic execution in Sect. 2. In Sect. 3, 
after recapitulating the construction of SEGs, we adapt our techniques for merg- 
ing and generalizing states from [25] to infer list invariants. Moreover, we adapt 
those rules for symbolic execution that are affected by introducing list invariants. 
Section 4 discusses the generation of ITSs and the soundness of our approach. 
Section 5 gives an overview on related work. Moreover, we evaluate the implemen- 
tation of our approach in the tool AProVE using benchmark sets from SV-COMP 
[3] and the Termination Competition [15]. All proofs can be found in [18]. 


Limitations. To ease the presentation, in this paper we treat integer types as 
unbounded. Moreover, we assume that a program consists of a single non-recursive 
function and that values may be stored at any address. Our approach can also 
deal with bitvectors, data alignments, and programs with arbitrary many (possi- 
bly recursive) functions, see [11,16,25] for details. However, so far only lists with- 
out sharing can be handled by our new technique. Extending it to more general 
recursive data structures is one of the main challenges for future work. 


2 Abstract States for Symbolic Execution 


The LLVM code for the for loop is given on the next page. It is equivalent to the 
code produced by Clang without optimizations on a 64-bit computer. We explain 
it in detail in Sect.3. To ease readability, we omitted instructions and keywords 
that are irrelevant for our presentation, renamed variables, and wrote list 


1 Note that ptr + n increases ptr by n times the size of the type *ptr. As we want 
to increase ptr by a number of bytes and ptr is not an i8 pointer, we first cast ptr 
to void*. Then ((void*)ptr + offsetof (struct list, next)) contains the next 
pointer, so we cast our computed address to struct list** before dereferencing it. 
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instead of struct.list. Moreover, we gave the C instructions (in gray) before 
the corresponding LLVM code. The code consists of several basic blocks including 
cmpF and bodyF (corresponding to the loop comparison and body). 

We now recapitulate the abstract 
states of [25] used for symbolic execu- 
tion and extend them by a component | define i32 @main() { ... 

LI for list invariants, i.e., they have | PP" 

the form ((b, i), LV, AL, PT, LI, KB). 0: k = load i32, i32* kad 

The first component is a program posi- 3: E pat bole label initPtr 
tion (b, i), indicating that instruction | voayrF: 


. Fa c curr = malloc (sizeof (struct list)); 
i of block b is executed next. Pos € E ee a a 


list = type { i32, list* } 


(Blks x N) is the set of all program 1: curr = bitcast i8* mem to list* 

positions, and Blks are all basic blocks. Oo sana B dandi nO 
The second component is a par- 3: curr_val = getelementptr list, 

: Sua : £ ` = list» curr, i32 0, i32 0 
tial injective function LV: Vp Vsym, 4: store i32 nondet, i32* curr_val 
which maps local program variables curr->next = tail 

: : 5: tail = load list*, list** tail_ptr 
Vp of the program P to an infinite 6: curr_next = getelementptr list, 
set Vsym of symbolic variables with list* curr, i32 0, i32 1 
y n Vp =Ø. We identify LV with 7: store list* tail, list** curr_next 
sym tail = curr; 
the set of equations {x = LV (x) | x€ 8: store list* curr, list** tail_ptr 
domain(LV)} and we often extend LV eE ane O 
to a function from Vp wN to Veym 9N 10:store i32 kinc, i32* k_ad 
i 11:br label cmpF 
by defining LV (n) =n for all n €N. R eine } 


The third component of each state 
is a set AL of (bytewise) allocations [v1, v2] with v1, v2 € Vsym, Which indicate 
that vı < vg and that all addresses between vı and v2 have been allocated. We 
require any two entries [v1, v2] and [wi, w2] from AL with vı + w1 or v2 + we to 
be disjoint. 

The fourth and fifth components PT and LI model the memory contents. 
PT contains “points-to” entries of the form vı “ty vg where v1, V2 € Vsym and 
ty is an LLVM type, meaning that the address vı of type ty points to v2. In 
contrast, the set LI of list invariants (which is new compared to [25]) does not 


describe pointwise memory contents but contains invariants Vaad Diy (of; : 
ty; : v;..0;) |, where neN,o, Vad, Ve, Vi, ÔiE€Vsym, Off ;€N for all 1 <i < n, ty and 
ty, are LLVM types for all 1 < i < n, and there is exactly one “recursive field” 
1 < j < n such that ty; =ty*.? Such an invariant represents a struct ty with n 
fields that corresponds to a recursively defined list of length ve. Here, Vaa points 
to the first list element, the i-th field starts at address Vvaq + off, (i.e., with offset 
off ;)? and has type ty;, and the values of the i-th fields of the first and last list 
element are v; and %;, respectively. For example, the following list invariant 
(1) represents all lists of length xe and type list whose elements store a 32-bit 
integer in their first field and the pointer to the next element in their second field 


? Soundness of our approach is not affected if there are other recursive fields, but our 
symbolic execution technique for list traversal on list invariants in Sect. 3.2.2 can 
only be applied if the traversal is done along field j. 

3 The field offsets can be computed using the data layout string in the LLVM program. 
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with offset 8. The first list element starts at address £mem, the second starts at ad- 
dress Ynext, and the last element contains the null pointer. Moreover, the first ele- 
ment stores the integer value tg and the last list element stores the integer ĉa- 


men pa [(0 : i32 : £na--Êna), (8 : List* : Gpext-.0)] (1) 


For example, this invariant represents the list with the allocation [£mem, £mem+ 15], 
where the first four bytes store the integer 5 and the last eight bytes store the 
pointer next, and the allocation [next, Unext +15], where the first four bytes store 
the integer 2 and the last eight bytes store the null pointer (i.e., the address 0). 
Here, we have zy = 2. Section 3.2.2 will show that the expressiveness of our list 
invariants is indeed needed to prove termination of programs that traverse a list. 

The last component of a state is a knowledge base KB of quantifier-free first- 
order formulas that express integer arithmetic properties of Vsym. We identify 
sets of first-order formulas {y1,..., Ym} with their conjunction g1 ^... A Pm- 

A special state ERR is reached if we cannot prove absence of undefined beha- 
vior (e.g., if memory safety might be violated by dereferencing the null pointer). 

As an example, the following abstract state (2) represents concrete states at 
the beginning of the block cmpF, where the program variable curr is assigned the 
symbolic variable £mem, the allocation [vx aa, 7£%4]) consisting of 4 bytes stores 
the value Yyinc, ANd Zmem points to the first element of a list of length a, (equal 
tO Zinc) that satisfies the list invariant (1). (This state will later be obtained 
during the symbolic execution, see State O in Fig. 3 in Sect. 3.1.) 


(cmpF,0), {curr = amen, Kinc = @xinc, ---}, {[@xaa, gen jaojo {ad ise Deine, ---}, (2) 
re š + $ 
{mem —rist [(0 : i32 : £na.-Êna), (8 : List* : Lnext--0)]}, gena Strada + 3, Ve = Ekinen eut 


A state s = (p, LV, AL, PT, LI, KB) is represented by a formula (s) which 
contains KB and encodes AL, PT, and LI in first-order logic. This allows us to 
use standard SMT solving for all reasoning during the construction of the SEG. 
Moreover, (s) is also used for the generation of the ITS afterwards. The encod- 
ing of AL and PT is as in [25], see [18]: (s) contains formulas which express 
that allocated addresses are positive, that allocations represent disjoint memory 
areas, that equal addresses point to equal values, and that addresses are differ- 
ent if they point to different values. For each element of LJ, we add the follow- 
ing new formulas to (s) which express that the list length ve is > 1 and the ad- 
dress Vaa of the first element is not null. If vg = 1, then the values v; and 0; of the 
fields in the first and the last element are equal. If ve > 2, then the next pointer 
vj in the first element must not be null. Finally, if there is a field whose values vz 
and 0; differ in the first and the last element, then the length vp must be > 2. 


{ve > 1A vad >1| (vad Sey [(off; : ty, : vi-.0:)|% 1) € LI} U 
n s ve ayn 
{A1 vi = ĝi | (vaa ory [(0ff; : ty; : vi--ô1)] f1) € LI and F (s) > ve=1}u 
ve eae ; 
{vj > 1| (vaa ry [(COff; : ty; : vi--0i)]L1) € LI with ty; =tyx and FE (s) = ve > 2}u 


{ve > 2 | (vada Su [(off; : ty; : vi--i)];1) € LI and 3 keN>o,k < n, s.t. = (s) > vk + Ôk} 


In concrete states c, all values of variables and memory contents are deter- 
mined uniquely. To ease the formalization, we assume that all integers are 
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(entry,0), Ø, Ø, Ø, Ø, Ø| A 


v 
(cmpF,0), {n= vn, tail-ptr = vip, kad = nea, +}; {ep 04], Doea, oil B 


end end 
{vip Criste 0, Ukaa Piz2 0}, Ø, {ugh = vip + 7, Viada = Ukad +3, ...} 
P p P 


(cmpF,1), {k =0, ...}, ALP, PTP, Ø, KBP C 
(cmpF, 1), {k =0, ...}, ALP, D (cmpF, 1), {x =0, ...}, ALP, E 
PTP, Ø, {vn > 0, ...} PT®, Ø, {van <0, ...} 
(cmpF, 2), {kltn = 1, ...}, AL”, PT®, Ø, KBP F - 
(bodyF,0), LV", AL®, PT®, Ø, KBP |G 


(bodyF,1), {mem = Unen; ---}, {[Umem; veza], ak PTE, Ø, {yond = Unen + 15, ...} 


mem 


y 
(bodyF, 7), {curr = Umem, nondet = Una, CUrr_val = Umem, tail = 0, curr_next = Ven, ...},| J 


AL”, {Unen i32 Una; ---}; Ø, {Ven = Unen + 8, «..} 


W 
K 
(bodyF, 11), {kinc = 1, ...}, AL”, {Uen Sist» O, Utp Criste Umem; Vraa S132 1, ...}, Ø, KBI 


Fig. 1. SEG for the First Iteration of the for Loop 


unsigned and refer to [16] for the general case. So for all v € Vsym(c) (i.e., all 
v € Vsym occurring in c) we have | (c) > v =n for some n € N. Moreover, here 
PT only contains information about allocated addresses and LI = Ø since the 
abstract knowledge in list invariants is unnecessary if all memory contents are 
known. 

For instance, all concrete states ((cmpF,0), LV, AL, PT, Ø, KB) represented 
by the state (2) contain £ allocations of 16 bytes for some ¢ > 1, where in the 
first four bytes a 32-bit integer is stored and in the last eight bytes the address 
of the next allocation (or 0, in case of the last allocation) is stored. 

See [18] for a formal definition to determine which concrete states are repre- 
sented by a state s. To this end, as in [25] we define a separation logic formula 
(s) sz which also encodes the knowledge contained in the memory components 
of states. To extend this formula to list invariants, we use a fragment similar to 
quantitative separation logic [4], extending conventional separation logic by list 
predicates. For any state s, we have F (s) 91 = (s), i.e., (s) is a weakened version 
of (s) gr that we use for symbolic execution and the termination proof. 


3 Symbolic Execution with List Invariants 


We first recapitulate the construction of SEGs. Then, Sect. 3.1 extends the tech- 
nique for merging and generalization of states from [25] to infer list invariants. 
Finally, we adapt the rules for symbolic execution to list invariants in Sect. 3.2. 

Our symbolic execution starts with a state A at the first instruction of the 
first block (called entry in our example). Figure 1 shows the first iteration of the 
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for loop. Dotted arrows indicate that we omitted some symbolic execution steps. 
For every state, we perform symbolic execution by applying the corresponding 
inference rule as in [25] to compute its successor state(s) and repeat this until 
all paths end in return states. We call an SEG with this property complete. 

As an example, we recapitulate the inference rule for the load instruction in 
the case where a value is loaded from allocated and initialized memory. It loads 
the value of type ty that is stored at the address ad to the program variable x. 
Let size(ty) denote the size of ty in bytes for any LLVM type ty. If we can prove 
that there is an allocation [v1, v2] containing all addresses LV (ad),..., LV (ad) + 
size(ty) — 1 and there exists an entry (w1 ty w2) € PT such that wı is equal 
to the address LV (ad) loaded from, then we transform the state s at position 
p=(b,2) to astate s’ at position p* = (b, i+ 1). In s’, a fresh symbolic variable w 
is assigned to x and w=we is added to KB. We write LV[x : =w] for the function 
where LV [x : =w](x) = w and LV [x : =w](y) = LV (y) for all y + x. 


load from initialized allocated memory (p:“x = load ty, ty* ad”, x,ad € Vp) 
s=(p, LV, AL, PT, LI, KB) 


s'=(p*, LV[x:=w], AL, PT, LI, KB u {w= w2}) 


if w € Veym is fresh and 


e there is v1, v2] € AL with H {s} > (vı < LV (ad) A LV (ad) + size(ty) — 1 < v2) 
e there are w1, W2 € Vsym with F (s) => (LV (ad) = w1) and (wi Cry w2) € PT 


In our example, the entry block comprises the first three lines of the C 
program and the initialization of the pointer to the loop variable k: First, a non- 
deterministic unsigned integer is assigned to n, i.e., (n=v,)€ LV”, where v is not 
restricted. Moreover, memory for the pointers tail_ptr and k_ad is allocated 
and they point to tail = NULL and k = 0, respectively (tail_ptr = vsp and 
k_ad = vk aa With (Vip Crist+ 0), (Uraa ia2 0) € PT”). For simplicity, in Fig. 1 
we use concrete values directly instead of introducing fresh variables for them. 
Since we assume a 64-bit architecture, tail_ptr’s allocation contains 8 bytes. 
For the integer value of k, only 4 bytes are allocated. Alignments and pointer 
sizes depend on the memory layout and are given in the LLVM program. 

State C results from B by evaluating the load instruction at (cmpF,0), see 
the above load rule. Thus, there is an evaluation edge from B to C. 

The next instruction is an integer comparison whose Boolean return value 
depends on whether the unsigned value of k is less than the one of n. If we 
cannot decide the validity of a comparison, we refine the state into two successor 
states. Thus, the states D and E (with (vn > 0) € KB? and (vn < 0) € KB®) are 
reached by refinement edges from State C. Evaluating D yields kltn =1 in F. 
Therefore, the branch instruction leads to the block bodyF in State G. State E 
is evaluated to a state with kltn =0. This path branches to the block initPtr 
and terminates quickly as tail_ptr points to an empty list. 

The instruction at (bodyF,0) allocates 16 bytes of memory starting at Umem 
in State H. The next instruction casts the pointer to the allocation from i8* 
to list* and assigns it to curr. Now the allocated area can be treated as a list 
element. Then nondet_uint() is invoked to assign a 32-bit integer to nondet. 
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(cmpF, 0), {n = vp, tail_ptr = vip, mem = Umem, CUTY = Umem, NONdet = Ung, Curr_val = Umem, 
á end end end 

curr_next = Ven, K = 0, kinc = 1, ...}, {[vtp, Uep ], oraa; vk ca], Dren Unen I}; L 

{Utp Crist* Umem, Ukad >i32 l, Umem i32 Und; Ucn list» O}, Ø, 


end end end 
{un > 0, vha = Ukad +3, Vip = Vp +7, v, = Umen + 15, Ven = Umen + 8, ...} 


> “mem 


yY 
(cmpF, 0), {n = vn, tail_ptr = vip, mem = Wmem, CUTY = Wmen, NONdet = Wna, Curr_val = Wren, 


curr_next = Wen, k= 1, kine = 2, ...}, {[v v h veaa UE GD, [aen wa T, Dn wee" TY M 


{Utp Priste Wmem, Uk ad =>i32 2, Umem 132 Und; Ucn list O, Wmem i32 Wnd, Wen Slist» Unen J Ø, 
end end 
= Wen +15, Wen = Wren +8, ««. } 


end © end 
{Un > 1, vad = Ueaat3, Vip = Utpt 7; Vrem mem 


= Umen +15, Ven = Unen +8, W, 


Fig. 2. Second Iteration of the for Loop 


The getelementptr instruction computes the address of the integer field of the 
list element by indexing this field (the second i32 0) based on the start address 
(curr). The first index (i32 0) specifies that a field of *curr itself is computed 
and not of another list stored after *curr. Since the address of the integer 
value of the list element coincides with the start address of the list element, this 
instruction assigns Umem to curr_val. Afterwards, the value of nondet is stored 
at curr_val (Umem i32 Una), the value O stored at vsp is loaded to tail, and a 
second getelementptr instruction computes the address of the recursive field of 
the current list element (Ucn=Unent+8) and assigns it to curr_next, leading to state 
J. In the path to K, the values of tail and curr are stored at curr_next and 
tail_ptr, respectively (Ven ist 0, Utp 1ist* Umem). Finally, the incremented 
value of k is assigned to kinc and stored at k_ad (Vk aa i32 1). 

To ensure a finite graph construction, when a program position is reached 
for the second time, we try to merge the states at this position to a generalized 
state. However, this is only meaningful if the domains of the LV functions of 
the two states coincide (i.e., the states consider the same program variables). 
Therefore, after the branch from the loop body back to cmpF (see State L in Fig. 
2), we evaluate the loop a second time and reach M. Here, a second list element 


with value wna and a next pointer Wen point- value: meee 
ing to Umem has been stored at a new allocation L: vse >| vna | O 
[wen WERE]. Now, curr points to the new ele- value next value mert 
ment and k has been incremented again, so kad M: wnen>| wna | @—t> vna | 0 


points to 2. 


3.1 Inferring List Invariants and Generalization of States 


As mentioned, our goal is to merge L and M to amore general state O that repre- 
sents all states which are represented by L or M. The challenging part during 
generalization is to find loop invariants automatically that always hold at this po- 
sition and provide sufficient information to prove termination of the loop. For O, 
we can neither use the information that curr points to a struct whose next field 
contains the null pointer (as in L), nor that its next field points to another 
struct whose next field contains the null pointer (as in M). 

With the approach of [25], when merging states like L and M where a list has 
different lengths, the merged state would only contain those list elements that 
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are allocated in both states (often this is only the first element). Then elements 
which are the null pointer in one but not in the other state are lost. Hence, prov- 
ing memory safety (and thus, also termination) fails when the list is traversed 
afterwards, since now there might be next pointers to non-allocated memory. 

We solve this problem by introducing list invariants. In our example, we will 
infer an invariant stating that curr points to a list of length x > 1. This invariant 
also implies that all struct fields are allocated and that there is no sharing. 

To this end, we adapt the merging heuristic from [25]. To merge two states 
s and s’ at the same program position with domain(LV*) = domain(LV* ), we 
introduce a fresh symbolic variable ryar for each program variable var and use 
instantiations 4s and us which map Zyar to the corresponding symbolic variables 
of s and s’. For the merged state 5, we set LV*(var)=2yar- Moreover, we identify 
corresponding variables that only occur in the memory components and extend 
Hs and usy accordingly. In a second step, we check which constraints from the 
memory components and the knowledge base hold in both states in order to find 
invariants that we can add to the memory components and the knowledge base 
of 3. For example, if [us(x), ps(x®®®)] € AL® and [us (x), pus/(x°®)] € AL® for 
x, 2°" € Veym, then |z, 2°"@] is added to AL*. To extend this heuristic to lists, 
we have to regard several memory entries together. If there is an ad € Vp such 
that ps(£aa) = vf” and us (£aa) = wf” both point to lists of type ty but of 
different lengths £, + ls with £s, s > 1, then we create a list invariant. 

For a state s we say that vf'" points to a list of type ty with n fields and 
length £s with allocations [uf'*", vgra] and values vpi (for 1 < k < l, and 
1 < i < n) if the following conditions (a)—(d) hold: 


(a) ty is an LLVM struct type with subtypes ty; and field offsets off; €N for all 
1 <i < n such that there exists exactly one 1 < j < n with ty; = ty. 

(b) There exist pairwise different [vg?", vera] e AL® for all 1 < k < l, and 

E (s) => vend = ystart + size(ty) — 1. 

(c) For all 1 < k < £, and 1 < i < n there exist VEST, Uki €Veym with E (s) > 
ugar’ = ugar’ + off, and (ugg uy, Vki) € PT3. 

(d) For all 1 < k < ls we have H (s) > vp j = gia. 


Condition (a) states that ty is a list type with n fields, where the pointer to the 
next element is in the j-th field. In (b) we ensure that each list element has a unique 
allocation of the correct size where vj'" is the start address of the first allocation. 
Condition (c) requires that for the k-th element, the initial address plus the i-th 
offset points to a value vg, of type ty,. Finally, (d) states that the recursive field 
of each element indeed points to the initial address of the next element. 


Then, for fresh x, £i, ĉi € Vsym, we add the following list invariant to LI z 


PE ee eee (3) 


To ensure that the allocations expressed by the list invariant are disjoint 
from all allocations in AL*, we do not use the list allocations [v%’@", vg"4] to 


infer generalized allocations in AL*. Similarly, to create PT*, we only use entries 
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L O|(cmpF,0), {n = £n, tail_ptr = Lp, MEM = Lem, CULT = Lmem, NONdet = Tna, Curr_val = Xnem, 


Z curr_next = Gen, Kk = ©, kint = trme, ---}, {[xtp, x], [zx.aa; ceay, 


re k A ï 
{Xtp list» Lmem, Lk ad 132 Lkinc J, {Lmem ist [(0 : i32 : £na--Êna), (8 : List* : £next--0)]}, 


end end 
M {an > Tk, Lead = Trad +3, Lip = Ttp +T, Len = Lren +8, eine = Ty +1, 1 < we, Te = eine, ---} 


Fig. 3. Merging of States 


v =y w from PT* and PT% where v is disjoint from the list addresses, i.e., 
where Ẹ (s) > v< vg" V v>ve"4 holds for all 1 < k < l, and analogously for s’. 
Moreover, we add formulas to KB” stating that (A) the length zy of the list is 
at least the smaller length of the merged lists, (B) xe is equal to all variables x 
which result from merging variables v and w that are equal to the lengths £, and 
Ls in s and s’, and (C) the symbolic variable x; for the value of the i-th field of 
the first list element is equal to all variables x with ys(x) =v1,; and ps (x)= w1; 
where v;,; and w1; are the values of the i-th field of the first list element in s 
and s’ (and analogously for the values ĉ; of the last list element): 


(A) min(é,, ls) < ze 
-1 = xe =x for all v, we Vsym with = (s v=, and F {s wes! 
B zeus (v)ne, (w) f ll v y ith £ d A 4 


(C) Neenz oron nð Ti =£ and Aescus (oe, Dou we) & =a for alll <i<n 


To identify the variables in the list invariant (3) of 5 with the corresponding 
values in s and s’, the instantiations 4s and us are extended such that Hs(£e)=ls, 
Hs (te) = Ls, fs(@i) = V1,4, Ms’ (£i) = Wii, Ms(@i) = Ve, i, and po (ĉi) = we, i for all 
1 < i < n. Similarly, if there already exist list invariants in s and s’, for each 
pair of corresponding variables a new variable is introduced and mapped to its 
origin by us and us. This adaption of the merging heuristic only concerns the 
result of merging but not the rules when to merge two states. Thus, the same 
reasoning as in [25] can be used to prove soundness and termination of merging. 

In our example, L and M contain lists of length £z =1 and Zm = 2. To 
ease the presentation, we re-use variables that are known to be equal instead of 
introducing fresh variables. If £mem is the variable for the program variable curr, 
we have 1p, (2mem) = Umem ANd Hm (Lmem) = Wmem. Indeed, Umem TESP. Wmem points to a 
list with values vz; resp. wp, as defined in (a)—(d): For the type list with n=2, 
ty =i32, ty,=list*, off ,=0, off .=8, and j=2 (see (a)), we have [Unen, Ue2¢]e AL” 
and [Umen, VELE], [wnen, WELE] € AL™, all consisting of size(list) = 16 bytes, see 
(b). We have (Unen 132 Una), (Ven rises 0)€ PT” with (Uen = Unen + 8) € KB” and 
(Umem + i32 Una), (Ven = list* 0), (Wmem — 432 Wna), (Wen list« Unem) € PT” with 
(Ucn = Unem + 8), (Wen = Wmem + 8) € KBM (see (c)), so the first list element in M 
points to the second one (see (d)). Therefore, when merging L and M to a new 
state O (see Fig. 3), the lists are merged to a list invariant of variable length x, 
and we add the formulas (A) 1 < x and (B) £g = Vine to KB. By (C), the 
i32 value of the first element is identified with £na, since uz (£na) is equal to the 
first value of the first list element in L and um (zna) is equal to the first value 
of the first list element in M. Similarly, the values of the last list elements are 
identified with 0, as in L and M. 
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After merging s and s’ to a generalized state 5, we continue symbolic execu- 
tion from s. The next time we reach the same program position, we might have 
to merge the corresponding states again. As described in [25], we use a heuristic 
for constructing the SEG which ensures that after a finite number of iterations, a 
state is reached that only represents concrete states that are also represented by 
an already existing (more general) state in the SEG. Then symbolic execution can 
continue from this more general state instead. So with this heuristic, the con- 
struction always ends in a complete SEG or an SEG containing the state ERR. 

We formalized the concept of “generalization” by a symbolic execution rule 
in [25]. Here, the state 3 is a generalization of s if the conditions (g1) — (g6) hold. 

Condition (g1) prevents cycles consisting only of refinement and gener- 
alization edges in the graph. Condition (g2) states that the instantiation 
H: Vsym(8) > Vsym(s) U Z maps symbolic variables from the more general state 
5 to their counterparts from the more specific state s such that they correspond 
to the same program variable. Conditions (g3)—(g6) ensure that all knowledge 
present in KB, AL, PT, and LI still holds in s with the applied instantiation. 


generalization with instantiation yu 
s=(p, LV, AL, PT, LI, KB) 


s=(p, LV, AL, PT, LI, KB) 
(g1) s has an incoming evaluation edge 
(g2) domain(LV) = domain(LV) and LV (var) = u(LV(var)) for all var € Vp where 
LV and LV are defined 
(93) F (s) = u(KB) 
(g4) if Ja1, ve] € AL, then [vı, ve] € AL with E (s) > vı = (z1) A v2 = u(z2) 
(gd) if (x1 Cry x2) € PT, 
then (vı Sty v2) € PT with E (s) > v1 = u(z1) A v2 = (x2) 
(96) if (zaa Sey [Coffs : ty; : 2i--#:)|th1) € ET, 
then either (vaa oy (off; : ty; : vs--0i)]%1) € LI with 
e F (s) = Vaa = U(Laa) A Ve = (ae) and 
e F (s) > vi = u(x) ^ ôi = lê) for alll <i<n, 
or vj?" points to a list of type ty and length £ with allocations [ve , vg] 
and values vg, (for 1 < k < £,1 < i < n) such that 
e E (s) => ui" = u(£a) ^ L= (ze), 
e H (s) = vii = ulzi) Avei = uli) for all 1 <i <n, and 
e if (zı ty z2) € PT, 
then E (s) > p(z1) < opie” V (21) > vg"? for al 1 < k< £. 


if 


Condition (g6) is new compared to [25] and takes list invariants into account. 
So for every list invariant | of 5 there is either a corresponding list invariant | 
in s such that lists represented by l in s are also represented by I in 3, or there 
is a concrete list in s that is represented by Į in 3. The last condition of the 
latter case ensures that disjointness between the memory domains of PT and 
LI is preserved. See [18] for the soundness proof of the extended generalization 
rule, i.e., that every concrete state represented by s is also represented by 5. 

Our merging technique always yields generalizations according to this rule, 
i.e., the edges from L and M to O in Fig. 3 are generalization edges. Here, one 
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(bodyF, 7), {tail = £men, currnext = yen, --}, ([Yymens yer’, --}, {Ynen Pis2 Yna; ---}; P 


zy f 
{@men —1ist [(0 : i32 : na-na), (8 : List* : £next--0)]}, {Xen = Lmen + 8, Yon = Ymen + 8, ...} 


(bodyF, 8), {tail = £mem, curr_next = Yen, ...}, {...}, {...}, Q 
ve A Ps 7 
{Ymen —rist [(0 : 132: Yna--Ena), (8: List* : £mem--0)]}, {Yen = Ymem + 8, ye = ze +1, ...} 


Fig. 4. Extending a List Invariant 


chooses uz and um such that wp (Lren) = Umem; uL(£e) =1, py (Baa) = Una, HL (Zna) 
= Und; [1 (2next ) S 0, UM (mem) = Wmem; um (ze) = 2, fom (na) = Wna; ot (¥na) = Und; 
and Hm (“next) =Umen- In both cases, all conditions of the second case of (g6) with 
lL =1 and fy = 2 are satisfied. With wr (2xinc) = 1 resp. Hm (Leinc) = 2, we also 
have F (L) > ur(z4) = HL(£rinc) resp. F (M) > um (xe) = Um (Trine). 


3.2 Adapting List Invariants 


To handle and modify list invariants, three of our symbolic execution rules have 
to be changed. Section 3.2.1 presents a variant of the store rule where the list 
invariant is extended by an element. In Sect. 3.2.2, we adapt the load rule to load 
values from the first list element and we present a variant of the getelementptr 
rule for list traversal. Soundness of our new rules is proved in [18]. For all other 
instructions, the symbolic execution rules from [25] remain unchanged. 


3.2.1 List Extension 

After merging L and M, symbolic execution continues from the more general 
state O in Fig. 3. Here, the values of k and kinc and the length of the list are not 
concrete but any positive (resp. non-negative) value with xg = Zginc = £k + 1. The 
symbolic execution of O is similar to the steps from B to J in Sect. 3 (see Fig. 1). 
First, the value ginc stored at k_ad is loaded to k. To distinguish whether k < n 
still holds, the next state is refined. From the refined state with k < n, we enter 
the loop body again. A new block [Ynen, yor] of 16 bytes is allocated and Ymen is 
assigned to mem and curr. Then, a new unknown value yna is assigned to nondet. 
The address of the i32 value of the current element (equal to Ynen) is computed 
by the first getelementptr instruction of the loop and the value yna of nondet 
is stored at it. The second getelementptr instruction computes the address Yen 
of the recursive field and results in State P in Fig. 4, where Yen = Ymem + 8 is added 
to KB’. Now, store sets the address of the next field to the head of the list 
created in the previous iteration. Since this instruction extends the list by an 
element, instead of adding Yen Crist* Lmem to PT Q, we extend the list invariant: 
The length is set to yg and identified with z +1 in KB®. The pointer £mem to the 
first element is replaced by Ymen, while the first recursive field in the list gets the 
value Zmem. Since (Ymen i32 Yna) € PT”, Yna is the value of the first i32 integer 
in the list. We remove all entries from PT® that are already contained in the 
new list invariant, e.g., Ymem i32 Yna- 
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To formalize this adaption of list invariants, we introduce a modified rule 
for store in addition to the one in [25]. It handles the case where there is a 
concrete list at some address v%@"", pa points to the m-th field of this list’s first 
element, one wants to store a value ¢ at the address pa, and one already has a list 
invariant l for the “tail” of the list in the j-th field (if m + j) resp. for the list at 
the address t (if m = j). In all other cases, the ordinary store rule is applied. 

More precisely, let the list invariant l m+j: 


describe a list of length v at the address — i i 

Vaa. Then | is replaced by a new list invari- 1” 7 PLOT a 
ant l’ which describes the list at the address pa wa al S 
veer" after storing t at the address pa. Irre- stari l i : rae A 
spective of whether m + j or m = j, the 

resulting list at v$t®t has the list at vaq m=: pa va Y 

as its “tail” and thus, its length vp is ve+1. gan 5 4 : =| ee 
We prevent sharing of different elements by S 
removing the allocation [v2 , v°"] of the eS è 
list and all points-to information of point- „set>| 5 | e| 2 | e>... J 


start end 
, ven"? 


ers in [v 


list extension (p: “store ty t, ty* pa”, te Vp uN, pa € Vp) 
s=(p, LV, AL, PT, LI, KB) 


s'=(p*, LV, AL\{fv"™, v}, PT’, LMD Y {7}, KB’) 


if 


there is l= (Vaa ee [(off, : lty, : wi..w:)|i1) € LI with lty, = 1ty* 
there is Jv", v°"4] e AL with E (s) > v™ =u" + size(1ty) — 1 
there exists 1 < m < n such that ty=1ty,, and E (s) > LV (pa) =v" + off m 
E (s) > Va =v; ifm #7 and E (s) > vaa = LV(t) if m=J9 
for all 1 <i < n with i + m there exist Haale Vi E Veym 
with E (s) = vet =u" + off; and (vf! Gury, vi) € PT 
è PT’ ={(a1 sy x2) € PT |H (s) > (uv < x1) V (a1 + size(sy) — 1 < v")} 


0 = (vt bay [Coffs Ley; : vi-0:)]Ea) 
e KB'= KB U {um = LV (t), ve =ve+ 1}, where vm, vg are fresh 


3.2.2 List Traversal 

After the current element Ymem is stored at a, and the value 2xinc of 
k is incremented to Yine and stored at Zag, we reach a state R at 
position (cmpF,0) by the branch instruction. However, our already existing state 
O is more general than R, i.e., we can draw a generalization edge from R to O 
using the generalization rule with the instantiation upr where HR(Emem) = Ymem; 
}tR(Zna) = Una; [UR (Len) = Yen; UR( Lx) = Linc, HR (Linc) = Ykinc, ur(ze) = Ye, 
LR(&na) = na, and HRlEnext) = Lmen- Thus, the cycle of the first loop closes here. 
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(bodyW,1), {ptr = per, curr’ = Lnext, NEXt_ptr = pp, NeXt = Lnext, ...}, 
{[xper, za vo}, {Lptr ®list* next; -Jo 


x 
ean Aree [(O : 132: tnana), (8: List* : Gnext..0)]}, 


{Xnp = Lnen + 8, Lnext 21, ...} 


(bodyW, 2), {ptr = per, curr’ = next, neXt_ptr = Lio next = next, -.-}, 
{[2per, eye], [Enen Tron 
1 
x 
£ a A 2 
{@next —rist [(0 : 132: xfa -Êna), (8: list*: x1... ..0)]}, 


1 next’ 
end _ 1 
nen = Znen +15, Ta 


1 
{np = Znen + 8, T£ E + 8, 2p = a ...} 


p 


> zeut {xptr iist* Unext, Umem i32 Tnd, Lnp list» Lnext; --- 


} 


Fig. 5. Traversing a List Invariant 


As mentioned, in the 


b 


(initPtr,0), {tail_ptr = Tep, ...}, {...}, {ep Criste Tmen, --- 


path from O to R there is 
a state at position (cmpF, 1) 
which is refined (similar to 
State C). If k < n holds, 
we reach R. The other 
path with k gm leads out 
of the for loop to the 
block initPtr followed by 
the while loop (see State 
S and the corresponding 
LLVM code on the side). 
The value Xmen at address 
tail_ptr is loaded to tail’ 
and stored at a new pointer 
variable ptr. State T is 
reached after the first itera- 
tion of the while loop body. 
Here, block cmpW loads the 
value mem Stored at ptr to 


we z Pi r 
{Xmen —rist [(0 : i32 : £na--Êna), (8 : List* : £next--0)]}, {...} 


y 
(cmpW, 0), {ptr = ptr; CUT’ = Zmem, NEXt_ptr = Trp, 


next = Znext, ---}, {[2per, zis], sst {per list» Dnext, -J 


{Znen g [(0 : i32 : £na.-Êna), (8 : List®* : £next--0)]}, 
{Xnp = Lmen + 8, ...} 


initPtr: 
0: tail’ = load list*, list** tail_ptr 
1: store list* tail’, list** ptr 
2: br label cmpW 


cmpW: 
0: str = load list*, list** ptr 
1: notnull = icmp ne list* str, null 
2: br i1 notnull, label bodyW, label ret 


bodyW: 

0: curr’ = bitcast list* str to i8* 
next_ptr = getelementptr i8, i8* curr’, i64 8 
next_ptr’ = bitcast i8* next_ptr to list** 
next = load list*, list** next_ptr’ 
store list* next, list** ptr 
br label cmpW 


PUNE 


str. Since it is not the null pointer, we enter bodyW, which corresponds to the 
body of the while loop. First, £mem is cast to an i8 pointer. Then getelementptr 
computes a pointer £nap to the next element by adding 8 bytes to mem. After 


another cast back to a list* pointer, we load the content of the new pointer to 
next. To this end, we need the following new variant of the load rule to load 


values that are described by a list invariant. 
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load from list invariant (p: “x = load ty, ty* ad_i”, x,ad_i € Vp) 
s=(p, LV, AL, PT, LI, KB) 
s' = (p*, LV[x:=w], AL, PT, LI, KBu{w=v;}) 


e there is l= (Vaa es (off; : ty; : vi..di)]21) € LI 
e there exists 1 < i < n such that ty = ty; and H (s) > LV (adi) = vaa + off; 


if w € Vsym is fresh and 


With this new load rule, the content of the new pointer is identified as 
Znext- It is loaded to next and stored at £ptr. Then we return to the block cmpW 
(State T). Merging T with its predecessor at the same program position is not 
possible yet since the domains of the respective LV functions do not coincide. 
Now, Znext is loaded to str and compared to the null pointer. Since we do 
not have information about £next, T’s successor state is refined to a state with 
Znext = 0 (which starts a path out of the loop to a return state), and to a state 
with Znext > 1, which reaches U after a few evaluation steps, see Fig. 5. Now, 
getelementptr computes the pointer Dp = next + 8 to the third element of the 
list, which is assigned to next_ptr. (U) contains xy > 2 since the first and the last 
pointer value are known to be different (£next #0). This information is crucial for 
creating a new list invariant starting at £next, which is used in the next iteration 
of the loop. Therefore, if our list invariant did not contain variables for the first 
and the last pointer, we could not prove termination of the program. In such a 
case where the pointer to the third element of a list invariant is computed and 
the length of the list is at least two, we traverse the list invariant to retain the 
correspondence between the computed pointer Tap and the new list invariant. 
In the resulting state V, we represent the first list element by an allocation 
[£nem, 7622] and preserve all knowledge about this element that was encoded in 
the list invariant (7°? =£mem+ 15, Lmem 132 Tna, Lap 1ist* Lnext)- Moreover, we 
adapt the list invariant such that it now represents the list at £next (i.e., without 
its first element) starting with the value zla. We also relate the length of the 
new list invariant to the length of the former one (2, = xe — 1). 

Thus, in addition to the rule for getelementptr in [25], we now introduce 
rules for list traversal via getelementptr. The rule below handles the case where 
the address calculation is based on the type i8 and the getelementptr instruc- 
tion adds the number of bytes given by the term t to the address pa. Here, the 
offsets in our list invariants are needed to compute the address of the accessed 
field. We also have similar rules for list traversal via field access (i.e., where the 
next element is accessed using curr’->next as in the for loop) and for the case 
where we cannot prove that the length ve of the list is at least 2, see [18]. 
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list traversal (p: “ 


pb = getelementptr i8, i8* pa, im t”, te VpUN, pa, pbe Vp) 
s=(p, LV, AL, PT, LI, KB) 


s' = (p*, LV [pb : =w3?"], ALU Jus", v4], PT’, LI\{I}u fl}, KB’) 


4 


if 


e there is l= (Vaa a [(off; : ty; : vi.-®i)]i-1) € LI with ty, = ty*, 
E (s) = LV(pa) =0;, E (s) = LV(t) = off;, and  (s) => ve > 2 
e PT’ = PT u{(uz™ Gry, vi) | 1 <i < n} 


w a 
o = (wt y [(off, : ty; : wi--0i)/21) 
e KB' = KBvu {uy = vaa, VI = 0" + size(ty) — 1, w= 0;, we = ve — 1, 
wart = pratt + off ;} U {ystart = Vad + off ; | 1 < i < n} 


a p yond ystart start qystart ayy weer W1,---,Wn E€ Vsym are fresh 


We continue the symbolic execution of State V in our example and finally 
obtain a complete SEG with a path from a state W at the position (cmpW,0) to 
the next state W’ at this position, and a generalization edge back from W’ to 
W using an instantiation uw. Both W and W” contain a list invariant similar 
to T where instead of the length x, in T, we have the symbolic variables zp and 
zı in W and W’, where uw (ze) = zp (see [18] for more details). 


(cmpW, 0), {ptr = zptr, curr’ = Zj egt: next_ptr = zip’ next = |W |(cmpw, 0), {ptr = 2ptr, curr’ = zilet next-ptr = ains next = 
n ond " ea LL = end r m 
alhas oJ, gers ZERAN, Agee riste thai oh VO ON ea et Cepte: MA Ja Caper Sasana allas h 
Z1 430: 1 z i 1 "él oii n F x m" 
{zmen — bist [(0:132: 2nq--2%4), (8:1list*: znext --Zhext Is {2nen ~ rist [(0:132: zng- 244), (Bisse: next Zhen) 
ze z 
1 eee | ae ere satt £ ‘ 5 i 
Znext list [(0 : 132: zaa Êna): (8 : list* : zpex_ OT}, le zilyet —Prist [(0 : i32: z.. 2na), (8: Liste: zi -0)]} 
{ore J oo “eel: Nn we n x 1 
{zap = Znext +8) zg = 2y-1,---} WwW {zap = Znext +8: zg =že1 +1, zg =2g-1,..-} 


4 Proving Termination 


To prove termination of a program P, as in [25] the cycles of the SEG are 
translated to an integer transition system whose termination implies termina- 
tion of P. The edges of the SEG are transformed into ITS transitions whose 
application conditions consist of the state formulas (s) and equations to iden- 
tify corresponding symbolic variables of the different states. For evaluation and 
refinement edges, the symbolic variables do not change. For generalization edges, 
we use the instantiation yz to identify corresponding symbolic variables. In our 
example, the ITS has cyclic transitions of the following form: 


O(an, Ue, Line, - u R(a@n, Ux, rince) | Line = Tk + LAY > Bk A... 
O(an, Linc, ++ :) 
"(Ze Zp...) | z=2zp-1lAz> 1a... 


(z,---) 


R(an, Ux, Leinc,--+) > 
W (ze, zg.) >+ W 
W' (ze, zp.) > W 
The first cycle resulting from the generalization edge from R to O terminates 
since k is increased until it reaches n. The generalization edge yields a condition 
identifying Tginc in R with £x in O, since WR(Xy) = Linc. With the conditions 
Trinec = Zy + l and Tp > £y (from KBO), the resulting transitions of the ITS 
are terminating. The second cycle from the generalization edge from W’ to W 
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terminates since the length of the list starting with curr’ decreases. Although 
there is no program variable for the length, due to our list invariants the states 
contain variables for this length, which are also passed to the ITS. Thus, the 
ITS contains the variable zp (where z¢ in W is identified with z, in W’ due to 
uw (Ze) = zp). Since the condition z% = ze — 1 is obtained on the path from W to 
W’ and ze > 1 is part of (W) due to the list invariant with length zg in LIY, 
the resulting transitions of the ITS clearly terminate. Analogous to [25, Cor. 11 
and Thm. 13], we obtain the following theorem. To prove that a complete SEG 
represents all program paths, in [25] we used the LLVM semantics defined by 
the Vellvm project [26]. One now also has to prove soundness of those symbolic 
execution rules which were modified due to the new concept of list invariants 
(i.e., generalization, list extension, and list traversal), see [18]. 


Theorem 1 (Memory Safety and Termination). Let P be a program with a 
complete SEG G. Since a complete SEG does not contain ERR, P is memory safe 
for all concrete states represented by the states in G.* If the ITS corresponding 
to G is terminating, then P is also terminating for all states represented by G. 


5 Conclusion, Related Work, and Evaluation 


We presented a new approach for automated proofs of memory safety and ter- 
mination of C/LLVM-programs on lists. It first constructs a symbolic execution 
graph (SEG) which overapproximates all program runs. Afterwards, an integer 
transition system (ITS) is generated from this graph whose termination is proved 
using standard techniques. The main idea of our new approach is the extension 
of the states in the SEG by suitable list invariants. We developed techniques to 
infer and modify list invariants automatically during the symbolic execution. 

During the construction of the SEG, the list invariants abstract from a con- 
crete number of memory allocations to a list of allocations of variable length 
while preserving knowledge about some of the contents (the values of the fields 
of the first and the last element) and the list shape (the start address of the first 
element, the list length, and the content of the last recursive pointer which allows 
us to distinguish between cyclic and acyclic lists). They also contain information 
on the memory arrangement of the list fields which is needed for programs that 
access fields via pointer arithmetic. The symbolic variables for the list length 
and the first and last values of list elements are preserved when generating an 
ITS from the SEG. Thus, they can be used in the termination proof of the ITS 
(e.g., the variables for list length can occur in ranking functions). 

In [5,6,22] we developed a technique for termination analysis of Java, based 
on a program transformation to integer term rewrite systems instead of ITSs. 
This approach does not require specific list invariants as recursive data structures 
on the heap are abstracted to terms. However, these terms are unsuitable for 


t Our approach can only prove but not disprove memory safety, i.e., a SEG with the 
state ERR just means that we failed in showing memory safety. 
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C, since they cannot express memory allocations and the connection to their 
contents. 

Separation logic predicates for termination of list programs were also used in 
[1], but their list predicates only consider the list length and the recursive field, 
but no other fields or offsets. The tools Cyclist [24] and HipTNT-+ [19] are integra- 
ted in separation logic systems which also allow to define heap predicates. How- 
ever, they require annotations and hints which parameters of the list predicates 
are needed as a termination measure. The tool 2LS [20] also provides basic sup- 
port for dynamic data structures. But all these approaches are not suitable if ter- 
mination depends on the contents or the shape of data structures combined with 
pointer arithmetic. In [10], programs can be annotated with arithmetic and struc- 
tural properties to reason about termination. In contrast, our approach does not 
need hints or annotations, but finds termination arguments fully automatically. 

We implemented our approach in AProVE [25]. While C programs with lists 
are very common, existing tools can hardly prove their termination. Therefore, 
the current benchmark collections for termination analysis contain almost no 
list programs. In 2017, a benchmark set? of 18 typical C-programs on lists was 
added to the Termination category of the Competition on Software Verification 
(SV-COMP) [3], where 9 of them are terminating. Two of these 9 programs do 
not need list invariants, because they just create a list without operating on 
it afterwards. The remaining seven terminating programs create a list and then 
traverse it, search for a value, or append lists and compute the length afterwards. 
Only few tools in SV-COMP produced correct termination proofs for programs 
from this set: HipTNT+ and 2LS failed for all of them. CPAchecker [2] and 
PeSCo [23] proved termination and non-termination for one of these programs in 
2020. UAutomizer [8] proved termination for two and non-termination for seven 
programs. The termination proofs of CPAchecker, PeSCo, and UAutomizer only 
concern the programs that just create a list. Our new version of AProVE is the 
only termination prover® that succeeds if termination depends on the shape or 
contents of a list after its creation. Note that for non-termination, a proof is a 
single non-terminating program path, so here list invariants are less helpful. 

For the Termination Competition [15] 2022, we submitted 18 terminating C 
programs on lists’ (different from the ones at SV-COMP), where two of them 
just create a list. Three traverse it afterwards (by a loop or recursion), and ten 
search for a value, where for nine, also the list contents are relevant for termina- 
tion. Three programs perform common operations like inserting or deleting an 
element. UAutomizer proves termination for a program that just creates a list but 
not for programs operating on the list afterwards. With our approach, AProVE 
succeeds on 17 of the 18 programs. Overall, AProVE and UAutomizer were the two 


5 https: //github.com/sosy-lab/sv-benchmarks/tree/master /c/termination-memory- 
linkedlists. 

6 We did not compare with the tool VeriFuzz [21], since it does not prove termination 
but only tests for non-termination and thus, it is unsound for inferring termination. 

T https: //github.com/TermCOMP/TPDB/tree/master/C/Hensel_22. 
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most powerful tools for termination of C in SV-COMP 2022 and the Termination 
Competition 2022, with UAutomizer winning the former and AProVE winning 


the latter competition. To down- SV-G T.|SV-C Non-T.|TermCmp T. 
load AProVE, run it via its web AProVE |7 (of 9)| 5 (of 9) 17 (of 18) 
interface, and for details on our [UAutomizer] 2 (of 9)| 7 (of 9) 1 (of 18) 


experiments, see https: //aprove-developers.github.io/recursive_structs. 
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Abstract. Several new algorithms for deciding emptiness of Boolean combina- 
tions of regular languages and of languages of alternating automata have been 
proposed recently, especially in the context of analysing regular expressions 
and in string constraint solving. The new algorithms demonstrated a significant 
potential, but they have never been systematically compared, neither among each 
other nor with the state-of-the art implementations of existing (non)deterministic 
automata-based methods. In this paper, we provide such comparison as well as 
an overview of the existing algorithms and their implementations. We collect a 
diverse benchmark mostly originating in or related to practical problems from 
string constraint solving, analysing LTL properties, and regular model checking, 
and evaluate collected implementations on it. The results reveal the best tools and 
hint on what the best algorithms and implementation techniques are. Roughly, 
although some advanced algorithms are fast, such as antichain algorithms and 
reductions to IC3/PDR, they are not as overwhelmingly dominant as sometimes 
presented and there is no clear winner. The simplest NFA-based technology may 
sometimes be a better choice, depending on the problem source and the imple- 
mentation style. We believe that our findings are relevant for development of 
automata techniques as well as for related fields such as string constraint solving. 


1 Introduction 


Efficient representation of regular properties of finite words has been the subject of 
research for a long time, with applications and results spanning much of the field of for- 
mal reasoning, including regular expression matching, verification, testing, modelling, 
or general decision procedures of logics. When regular properties are combined using 
Boolean and similar operations, interesting decision problems are PSPACE-complete. 
This includes the most essential problem of language emptiness (further just empti- 
ness). The textbook approaches that use deterministic automata are plagued by state 
space explosion. Determinization and complementation is done by exponential sub- 
set construction and conjunction is quadratic. This motivated the research on effi- 
cient algorithms for non-deterministic and alternating finite automata (NFA and AFA, 
respectively). 

Using nondeterminism and alternation, one can gain one or two levels of exponen- 
tial savings in the size of automata, respectively. Alternation in context of automata was 
first studied in [24] and [18,38,53], and extensively in the context of automata over 
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infinite words and temporal logics (e.g., [57,58, 66, 76]). It adds conjunctive branching 
to the disjunctive non-deterministic branching and allows to avoid the blow-up in the 
automata size completely. However, from the perspective of the worst case complex- 
ity, the gained succinctness is payed back by the PSPACE-completeness of language 
emptiness. Still, the more succinct the representation gives more opportunities for clever 
heuristics that combat the worst case complexity and work in practical cases, essentially 
by avoiding re-creation of the entire (non)deterministic representation. 

Several very promising techniques and their implementations were proposed during 
the recent years. The latest advances in testing AFA emptiness appeared in the context 
of analysing combinations of regular expressions and in string solving. A group of these 
techniques is based on reducing AFA emptiness to a reachability in a Boolean transi- 
tion systems and using existing implementations of model-checking algorithms, most 
notably of IC3/PDR [15,46], such as ABC [17], nuXmv [22], or IC3Ref [16], to solve 
it [27,28,47,80]. The most recent contribution from [73] extends the SMT-solver Z3 
with symbolic derivatives, a generalisation of Antimirov derivatives of regular expres- 
sions. Z3 uses them to convert a combination of regular expressions into an alternat- 
ing/Boolean automaton and on the fly tests its language emptiness through the classical 
de-alternation and a search for an accepting configuration. 

Slightly older algorithm for testing equivalence of AFA (convertible to an emptiness 
test) is based on computing bisimulation up-to congruence [30]. It generalizes the orig- 
inal NFA-equivalence test of [11]. The congruence closure algorithms were preceded 
by the antichain algorithms that optimize the subset construction by the subsumption 
pruning [41,82], and by the first attempt to use the model checking algorithms, namely 
the algorithm Impact of [63], to emptiness of combinations of regular properties [40]. 
Lastly, the area of string constraint solving gave rise to a large variety of string con- 
straint solvers. They approach combinations of regular properties through a spectrum of 
clever techniques based e.g. on automata, transformations to other types of constraints, 
reasoning on lengths of strings, Parikh images, etc. (e.g. Z3 [65,73], CVC4/5 [7,68], 
Z3Str4 [9], OSTRICH [25,26], Trau [4,5] to name a few). 

These works demonstrate a significant promise, but they are presented in specific, 
often narrow contexts and under varying views on state of the art. Consequently, they 
have never been sufficiently compared against each other. Even comparisons against 
the most efficient implementations of the more standard techniques based on (non)de- 
terministic automata is rare. String solvers were compared only against string solvers, 
advanced AFA-emptiness tests were compared only against the basic de-alternation. 
A somewhat interesting comparison was done only between NFA-antichain and up- 
to congruence-based language inclusion and equivalence test in [11] and in [39], and 
between the basic antichain based AFA emptiness and a version that uses abstract 
interpretation [41]. A number of works also take as their baseline implementations 
of automata or string solvers which, even though being respectable tools in their own 
right, are currently not the fastest solvers of combinations of regular properties in either 
category. On top of that, all the mentioned works on solving combinations of regular 
properties use only narrow benchmarks, often mutually exclusive. 

Systematic comparisons of tools and algorithms on meaningful benchmarks is obvi- 
ously needed to answer the questions ‘What to use?’ and ‘What to compare with?’, and 
generally for the field of reasoning about regular properties and automata to progress. 
We thus present a comparison of implementations of major algorithms. We compare 
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the tools on a large benchmark of problems that we have collected from other works, 
from string constraint solving problems, analysis of regular expressions, regular model 
checking, and analysing LTL properties of systems. We believe that it is currently the 
most comprehensive benchmark in existence. Our main focus is on examples around 
string solving and analysis of regular expressions, which is also where the most of the 
recent developments has happened. These benchmarks mostly allow for a relatively 
simple representations of automata transition functions. Even though the alphabets in 
examples coming form this are large (e.g. UNICODE with up to 2°? symbols), the 
alphabet size can, in most cases, be reduced to few symbols by working with alpha- 
bet minterms (classes of indistinguishable symbols) instead of individual symbols. The 
issue of effective symbolic representation of transition relations with large alphabets 
then does not dominate the evaluation, although it would be critical in other application 
areas, such as deciding WS1S (monadic second-order logic of one successor) or linear 
integer arithmetic [20, 44, 81]. 

We have obtained results that paint the basic landscape of the available techniques 
and tools. They identify tools and approaches which are likely to work well and should 
be used as the baseline in comparisons. We also provide a relatively diverse and large 
benchmark to be used in comparisons. The results broadly confirm that the new algo- 
rithms represent a leap in efficiency compared to the technology of DFA and also make 
a reduction of a problem to language emptiness of alternating automaton an attractive 
option. On the other hand, they challenge some folklore knowledge and conclusions 
implied elsewhere. For instance, reductions to IC3/PDR, although yielding one of the 
fastest algorithm, are not as vastly superior as sometimes presented. Some practically 
relevant benchmark categories are best solved by a combination of an antichain algo- 
rithm with a SAT solver. Others, surprisingly many in fact, by a simple efficiency ori- 
ented implementation of basic algorithms for nondeterministic automata. Our results 
also underscore that there is no universal silver bullet. The particular kind of the prob- 
lem, determined to a large degree by its source, is a decisive factor that should be taken 
into account when choosing and tuning a solver. 

We will maintain and further grow the benchmark set, at GitHub [1], as well as 
the framework for the entire comparison, at [2], in order for it to be easily usable and 
extensible by others. 


2 Preliminaries 


A (nondeterministic) finite automaton (NFA) over È is a tuple A = (Q,A,/, F) where 
Q is a finite set of states, A is a set of transitions of the form q4a}r with q,r € Q and 
a E€ È, I C Q is the set of initial states, and F C Q is the set of final states. A run of A 
over a word w € &* is a sequence po4aiP pi {1a} ... Aan} pn where for all 1 <i <n, it 
holds that a; € LU {€}, w =a ,-a2---dy, and either pi-1 4a: pi € A or pi-1 = pi, Gi =€. 
The run is accepting if po € I and p, € F, and the language L(A) of A is the set of all 
words for which A has an accepting run. 

The automaton is deterministic (DFA) if for every state q and symbol a, A has 
at most one transition g{abr. Any NFA can be determinized by the subset construc- 
tion, which creates the DFA A’ = (22, A’, {1},{S | SA F #0}) where S{apS’ € A’ iff 
S’={s’ | s E SAs4aps’ € A}. The basic automata constructions implementing Boolean 
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operations with languages are intersection, AN A’ = (Q x Q’, A*,I x I’, F x F’) where 
(q,q')tap(r,r’) € AX iff gfapr € A and g’4{apr’ € A’, non-deterministic union 
AU RH’ =(QUQ’,AUA’,IUI’, FU F’), deterministic union by product which is the 
same as N up to that the final states are F x Q U Q x F, and complementation which 
consists of determinization and complementing the final states. 


Alternating Automata. An alternating finite automaton (AFA) in the most general form 
would be a tuple M = (2, P,Q, ô, I, F) where, when denoting B(X) the Boolean pred- 
icate formulae over variables X: 1) È is a finite alphabet; 2) P is a set of unary symbol 
predicates with a free variable a; 3) Q is a finite set of states; 4) 6: Q — B(QUP) is 
a transition function where states of Q have only positive occurrences 5) J € B(Q) is a 
positive initial condition; and 6) F € B(Q) is a negative final/accepting condition.! 

It can be interpreted as the forward NFA A‘ = (=,P(Q),A‘,/’, F’) with states c C Q 
called configurations of A. Assume many sorted interpretation of formulae over vari- 
ables Q of the type Boolean (values 0 and 1) and the variable a of the type X. A set 
of states c C Q is understood as an assignment Q — {0,1} in which c(q) = 1 corre- 
sponds to q € c. A pair (c,a), a € È is understood as the same assignment extended 
with œ + a. The satisfaction relation F between a formula and a configuration c or 
a pair (c,a) is defined as usual. The transition relation A‘ then contains a transition 
ctape’ iff (c’,a) F Agec A(q), and 7’ and F” are the sets of configurations that satisfy 
I and F, respectively. It is common to define A‘ to contain only the smallest transitions, 
that is, for a given c and a, only the transitions c-(a}c’ with the C-minimal target c’ are 
in A.? The language of A, L(A), is the language of Af. 

The AFA can equivalently be interpreted as the backward NFA, the automaton A? = 
(=,P(Q),A°, I’, F’) where ctabc’ € A? if (c,a) H A(q) for each q € c. Here it is 
enough to take, for a given c’ and a, only the transition with the C-largest source c? 
(this makes the transition relation backward deterministic). 


Boolean Automata. Alternating automata may be extended to Boolean finite automata 
(BFA) by allowing any Boolean combination in the initial, final, and transition formulae 
(states in the initial and transition formulae may occur negatively, states in the final 
formula may occur positively). Note that the extension of AFA to BFA is not dramatic, 
as a BFA is easily encoded as an AFA with only double the size, by the following steps: 
1) for each q € Q, add state g with A(g) = =A(q), 2) transform all formulas in J, F, A 
to DNF, 3) replace all literals ~q by g in A and J and replace literals q by 7@ in F. 


Restricted Forms of AFA Transition Relation. The general form of AFA, as defined 
above, is the most succinct. It provides space for most optimizations, such as in [77]. 
Automata in this form are generated from LTL conversions of [34] used in [30,77]. On 
the other hand, only a small subset of algorithms and tools support AFA in this most lib- 
eral form. A common restriction (used e.g. in [30]) is to separate symbols from states in 


l This is not a most standard definition of AFA but it allows us to later cover and categorize their 
common syntactic variants. See e.g. [18,41,57] for more standard definitions. 

2 A state in a configuration is understood as a constraint. The less constraints, the more can be 
accepted from the configuration. Transitions to more constrained configurations are useless. 

3 Going backward, larger configurations are more permissive. Transitions from the same target 
with smaller configurations are useless. 
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the transition formulae, that is, having A(q) in the form yA y with y € B(P),w € B(Q). 
We call such AFA separated. The transition relation can then be seen as a function 
Q — B(P) x B(Q). Separated AFA are often considered with the state formula yw in the 
disjunctive normal form (e.g. in [36,41]), which we call the DNF form, and A then may 
be seen as a set of transitions of the form q4ẹ}c where Ac is a (positive) clause of y. 


The Decision Problems. We will concentrate on two decision problems: 


(1) AFA emptiness asks whether the language of the given AFA is empty. 

(2) Emptiness of Boolean combinations of regular properties (BRE), asks whether a 
Boolean combination of regular languages, given as automata or regular expres- 
sions, is empty (languages can be combined with N, U, and complement wrt. X*, 
which also covers testing inclusion and equivalence’*). 


3 Existing Algorithms and Tools 


In this section, we will overview the existing approaches and tools implementing AFA 
and BRE emptiness. 


3.1 Representation of Automata Transition Relations 


In the simplest form, a predicate on a automata transition represents a single letter 
from the alphabet. This is called an explicit transition. Explicit automata are simple, 
allow for low level optimizations, and implementation of complex algorithms for them 
is manageable (such as advanced algorithms for computing simulations [23,50,70]). 
The technique of a-priori mintermization, that replaces the alphabet by the alphabet of 
minterms, classes of indistinguishable symbols, makes explicit automata usable also 
when alphabets are large. However, when the number of minterms tends to explode, 
explicit automata do not scale. 

Various implementations of automata have been using transition predicates imple- 
mented as BDDs, Boolean formulae, formulae over SMT-theory of bit-vectors, inter- 
vals of numbers, etc. This has been systematized in the works on symbolic automata 
[31,33,79], where the symbol predicates may be taken from any effective Boolean 
algebra (and the automata are in the separated form). Even more compact than sym- 
bolic automata are representations of the transition relation used in the WSIS solver 
MONA or in some of the implementations of AFA, which in a way drop the restriction 
to the separated form. We will discuss the concrete implementations below. 


3.2 (Non)deterministic Finite Automata 


The baseline approach to solve BRE is to use DFA or NFA. Boolean operations are 
implemented as the classical construction listed in Sect. 2. Automata may be kept deter- 
ministic, or they are kept non-deterministic whenever possible and determinized only 
before complementing. An important ingredient of achieving efficiency is usually to 


+E © Lag emptiness of L’ NL and equivalence is emptiness of (L’NL)U(L’L). 
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minimize automata at least once every few operations (important e.g. in applications 
such as regular model checking [12] or some approaches to string solving [4, 10,25]). 
The deterministic approaches construct the minimal DFA by the Hopcroft, Moore, 
Brzozowski, or the Huffman algorithm [19,52,54,64], the non-deterministic approach 
may use simulation [23,45,50,55,70] or bisimulation [48, 69, 75] based reduction meth- 
ods. Simulation reduces significantly more but is much costlier. DFA/NFA are imple- 
mented in many libraries. Here we select a representative sample. 

First, ENFA is the simplest tool, our own implementation of NFA, which was origi- 
nally meant to play the role of a baseline. It uses explicit automata with mintermization. 
It is implemented in C++, with efficiency in mind, but with no extensive optimizations 
(roughly, transitions from a state stored in a two layered data structure, the first layer 
divided and ordered by symbols, and the second layer ordered by the target state). It 
uses an off the shelf implementation of one of the newest generation algorithms for 
computing simulation [23,50,70] (that achieve good efficiency through a usage of the 
partition-relation data structure) taken from VATA tree automata library [59] (imple- 
menting namely [50]).5 

The BRICS automata library [67] is often considered a baseline in comparisons [67]. 
It uses primarily deterministic automata and transition relation represented symbolically 
using character ranges. It is written in Java and relatively optimized. 

The AUTOMATA library [78], made in C#, implements symbolic NFA/DFA parame- 
trized by an effective Boolean algebra. We use it with the default algebra of BDDs. 
AUTOMATA has been long developed and has accumulated many optimizations and 
novel techniques for handling symbolic automata (e.g., optimized minimization [32]). 

MONA [44], written in C, is the most influential and optimized implementation 
of deterministic automata. It specialises in deciding WS1S formulae, which besides 
Boolean combinations includes also quantification. The decision procedure generates 
DFA with complex transition relations over large alphabets of bit-vectors. For this pur- 
pose, MONA uses a compact representation of the transition relation: a single MTBDD 
for all transitions originating in a state, with the target states in its leaves. MONA can 
represent only a DFA, hence it always implicitly determinizes. 

VATA [59], written in C++, is a library implementing non-deterministic tree 
automata. As NFA are a special case of tree automata, we can use it as an implementa- 
tion of the basic constructions for explicit NFA. It is relatively optimized. We include 
it into the comparison for its fast implementation of the antichain inclusion checking 
[12,49], which for NFA boils down to the inclusion check of [36]. 


3.3 Alternating Automata 


De-alternation. The basic approach to AFA emptiness is de-alternation, transformation 
to an NFA, either the forward Af or the backward A”, followed by testing the emptiness 
of the resulting NFA. Both NFAs are constructed by a variation on the NFA subset 
construction. We are not aware of any tool using pure de-alternation, and we believe 
that it would not be competitive. The forward algorithm is however the basis of [73] 


5 In our experiment, simulation is only used after parsing and has minimal overall impact. 
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used in Z3 where it is run on the fly with a novel symbolic derivative construction 
(discussed also in the paragraph on string constraint solvers). 


Interpolation Based Abstraction Refinement. Attempts to harness model checking algo- 
rithms to AFA emptiness appeared in the context of string solving and processing of 
regular expressions. To our best knowledge, the earliest attempt was [40], where con- 
junctions of regular constraints were solved using the interpolation-based algorithm of 
[62]. The interpolation-based abstraction refinement, namely the algorithm Impact of 
[63], was also used in [56]. This work concentrated on more general problem, solving 
emptiness of AFA over data words with an infinite data domain (that can relate past and 
current values of data variables). Their tool JALTIMPACT [3] (in Java), that we include 
into our comparison, can be run on our benchmark too. 


Reduction to Reachability and IC3/PDR. The work of [80] presented the first transla- 
tion of string constraints (mostly BRE) into reachability in a Boolean transition system 
(circuit) that was then solved by the model checker nuXmv [22]. This was de facto the 
first reduction of AFA emptiness to reachability in a Boolean transition system (BTS). 

Let us briefly overview the basic principle of the reduction. The forward BTS for 
an AFA A has configurations that are Boolean assignments to Q, initial and final 
configurations satisfy J and F, respectively, and transitions are given by the formula 
GN : Aqeo 4 — [A(q)]’. Here we use [y]’ to denote the formula obtained from y by 
substituting every state q by its primed version q’, and we will also denote by [c]’ the 
primed version {q’ | q € c} of a configuration c. A successor of a configuration c is any 
configuration č such that [¢]’ satisfies JOAa KON ^ A\gec 4 (the symbol variable alpha 
is of the bit-vector sort). Reachability is then the transitive and reflexive closure of the 
successor relation and the reachability problem asks whether a final configuration is 
reachable from an initial one. It is the case if and only if A is not empty. The forward 
reduction has been used in [80]. Alternatively, the backward BTS for A has the initial 
configurations satisfying F, final configurations satisfying 7, and the successor relation 
given by the formula of : Ageo gd’ — A(Q). 

The work [28] applied IC3/PDR [15,46], implemented in IC3Ref [16], together with 
the backward BTS reduction to solve emptiness of BRE and obtained very encouraging 
results. The implementation used in [28], called Qzy, is, however, proprietary and not 
publicly available. Similar approach was taken by [47], where a string constraint was 
translated to a multi-tape AFA and then to a BTS by the forward translation, and given 
to IC3/PDR to solve through tools nuXmv [22] or ABC [17]. Results of [77] seem to 
indicate that the backward translation is better and the same is suggested by the com- 
parison in [27,28] in which the string solver Sloth [47], based on the forward reduction, 
was much slower than Qzy, based on the backward reduction. In this comparison, we 
include our own C++ implementation BWIC3 of the backward reduction based on the 
model checker ABC. 


Antichains. Antichain algorithms presented in [82] were the first breakthrough in solv- 
ing BRE. They use subsumption relations between the states of the automata con- 
structed by variations of the subset construction to prune the constructions. They were 
used to test language universality and inclusion of NFAs and AFA emptiness. The AFA 
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emptiness namely is based on an on-the-fly search for an accepting state of the Af or for 
an initial state of the A°. Subsumption prunes discovered states that are larger (smaller 
for the backward algorithm) than others. 

The antichain algorithms were enhanced and generalized in a number of works, 
e.g. with a more aggressive pruning by the simulation-based subsumption [6, 36], or by 
counterexamples guided abstraction refinement in [41]. In this comparison, we include 
the NFA inclusion check implemented in the VATA tree automata library [59]. We 
also experimented with a student-made implementation of the antichain AFA empti- 
ness check of [41] that uses abstraction refinement (the original implementation is no 
longer maintained and we were not able to run it). However, not being able to achieve a 
competitive performance, we excluded it from the comparison. One reason of the poor 
performance may be that simplest form of AFA, explicit DNF form (used in the original 
version [41]), might be too inefficient and costly to construct in our examples, partly 
due to a large number of minterms induced by the AFA emptiness benchmark. 

We implemented (in C++) the antichain AFA emptiness test of [36] that integrates 
tightly with a SAT solver to handle the general form of AFA with large alphabets. We 
will refer to it as ANTISAT. We will briefly explain its principle. It essentially imple- 
ments the reachability test for the backward BTS discussed in the previous paragraph. 
A configuration c is represented by the conjunction ġe = Ageo\c ™4. Note that ġe is 
satisfied by the downward closure of c, which are all configurations included in (sub- 
sumed by) c. To compute predecessors of configurations represented by ġe, the SAT 
solver (namely MiniSAT [37]) is called on the formula ® : ph A bc A Wach- Here, Wach 
excludes all already discovered configurations from the solution. It is a conjunction of 
clauses ġe : V qeQ\c 4 for every previously discovered configuration c. The SAT solver 
discovers a satisfying assignment e, which is turned into a new configuration c’ = QNe 
(that is, the values of the symbol bits constituting the bit-vector œ are omitted from 
e). Unless c’ is initial, it is queued for further predecessor computation and is imme- 
diately added to ¢ach through the interface of incremental SAT solving as the clause 
$e. Finally, only maximal predecessors of c are of interest, as the non-maximal ones 
are subsumed by them. We enforce the maximality of c through working directly with 
the internal SAT solver structures: at decision points, the SAT solver is forced to give 
priority to decisions that assign 1 to state variables. 


Bisimulation up-to Congruence. A later class of algorithms, here refered to as up-to 
algorithms, checks equivalence as a bisimulation between configurations of AFA, and 
utilises the up-to congruence technique to prune the search space. The first algorithm on 
NFA equivalence [11] was extended to alternating automata emptiness check in [30]. 
These algorithms are close to antichains. As shown in [11], the pruning potential of the 
up-to techniques is in theory the same or larger than that of antichain. A disadvantage 
of the up-to congruence technique is the need for expensive evaluation of congruence 
closures. The more extensive experiments of [39] shows antichain algorithms as faster, 
with an exception of randomly generated automata with small alphabets and very dense 
transition relations. We include into the comparison the Java implementation of the AFA- 
emptiness of [30] (emptiness reduces to equivalence with a trivial empty AFA), that we 
refer to as BISIM. The other implementations of up-to algorithms we are aware of, from 
[39] and [11], are single-purpose programs that decide equivalence of two NFAs, hence 
we would be able to run them on a very small fraction of our benchmark only. 
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3.4 String Constraints Solvers 


There are dozens of string constraint solvers that implement, to a various degree, a sup- 
port for deciding combinations of regular properties. String languages are rich and BRE 
are not the absolute priority of the solvers, hence they perform on them generally worse 
than specialised tools. However, string solvers implement a wide scale of unique tech- 
niques and pragmatic heuristics that may work in specific instances. Representatives of 
the solvers with the most mature implementations (also used in most comparisons in 
the literature) are Z3 [65,73] and CVC5 [7,68]. CVC5 solves BRE mostly through 
rewriting rules. Recently [73] extended Z3 with an approach based on the Antimirov 
derivative automata construction generalised to symbolic automata and extended regu- 
lar expressions. Essentially, the construction produces a symbolic AFA/BFA and checks 
its emptiness on the fly while running the forward de-alternation. As shown in [73], it is 
significantly more efficient in solving BRE than other SMT solvers (including CVC5). 


3.5 Other Approaches and Tools 


Although we believe that we have collected a representative subset of existing algo- 
rithms and tools, we have not collected all interesting specimens. Some were not avail- 
able, some were difficult to run or prepare the inputs for, some seemed covered by 
experimentation in other works. Including these tools and algorithms into the compar- 
ison could still be interesting and we leave it for the future work (we plan to keep 
extending the tool base as well as the benchmark set). Namely, the tool DPRLE [51], 
used in the comparison in [28], seemed to be mostly outperformed by the IC3/PDR 
approach implemented in Qzy, however, not absolutely consistently. The implementa- 
tion of NFA antichain and up-to congruence techniques used in [39] seems efficient, 
with its NFA antichain inclusion twice as fast as that of VATA. The up-to congruence 
NFA equivalence checking of [11] could be fast too ([11] and [39] report somewhat 
conflicting results). There are numerous NFA/DFA libraries, e.g. the C alternative of 
BRICS [61] or the Java implementation of symbolic NFA of [29]. ALASKA [35] might 
contain interesting implementations of antichain algorithms but is no longer maintained 
and available. Our comparison is missing a basic implementation of antichain-powered 
de-alternation for explicit AFA in the DNF form, which, if not overwhelmed by a large 
number of minterms, could reach a good performance through simple fast data struc- 
tures, similarly to our ENFA. 


4 Benchmarks 


We collected as comprehensive benchmark as possible, harvesting examples used in 
previous works as well as generating some of our own. It is available together with 
the whole experiment from [2] and at GitHub [1] (we plan to maintain and grow the 
benchmark and welcome contributors). 

Our main focus of the current benchmark set is the areas where the most of the 
development in solving AFA and BRE emptiness happened recently, which is string 
constraint solving and analysis of regular expressions used in analysing and filtering 


Reasoning About Regular Properties: A Comparative Study 295 


texts. Atomic regular properties are here mostly given in the form of regular expressions 
over UNICODE character classes. The alphabet is large but the number of minterms 
is mostly small or moderate. This is true also for our examples from regular model 
checking. Symbolic handling of complex transition relations over large alphabets is thus 
not absolutely crucial and the experiment can stay focused on the main algorithms for 
emptiness check. For that reason, we do not include benchmarks from solving WSIS 
[21], the primary target of MONA, or Presburger arithmetic with automata [13,81], 
where the techniques of handling symbolic alphabet are indispensable. Techniques spe- 
cialising at this kind of problems would deserve their own study. Our benchmarks 
where the symbolic alphabet representation is still rather important are AFA coming 
from (combinations of) LTL properties, with alphabets of sets of atomic propositions, 
and from translations of string constraint problems to AFA with complex multi-track 
alphabets.° 


Boolean Combinations of Regular Expressions. This group of BRE contains bench- 
marks on which we can run all tools, including those based on NFA and DFA. They 
have small to moderate numbers of minterms (about 30 in average, at most over a 
hundred). 


b-smt contains 330 string constraints from the Norn and SyGuS-qgen, collected in SMT- 
LIB benchmark [8], that fall in BRE. These were also used to compare SMT-solvers 
in [73]. 

b-hand-made has 56 difficult handwritten problems from [73] containing membership 
in regular expressions extended with intersection and complement. They encode (1) 
date and password problems, (2) problems where Boolean operations interact with 
concatenation and iteration, and (3) problems with exponential determinization. 

b-armc-incl contains 171 language inclusion problems from runs of abstract regular 
model checking tools (verification of the bakery algorithm, bubble sort, and a pro- 
ducer-consumer system) of [12]. These examples were used also in [11,39]. 

b-regex contains 500 problems, obtained analogously as in [30,77], of the form r1 A 
12 AF3 Arg =r] Ar2 Ar3 Ara Ars, Where each r; is one of the 75 regexes’ from 
RegExLib [71] selected so that rı A r2 Ar3 Ara Ars is not empty. This benchmark 
is inspired by spam filtering, where we want to test whether a new filter rs adds 
anything to existing filters. We transformed this problem into the inclusion rs © 
rı Ar2 Ar3 Ara, and kept the original form for BISIM which expects an equivalence. 

b-param has 8 parametric problems. Four are from [40]: 
(1) [a-clala-c]{n+1}n [a-c]la[a-c] {n} (long strings), 


6 We did not attempt to generate purely random problems. First, purely random automata gen- 
erated e.g. by [74] seem to have different characteristics than automata coming from practical 
problems (e.g. in [12,39]). Second, although generating random NFA is possible with a gen- 
erator controlled by three simple parameters which give a manageable parameter-value space 
covering all NFA, it is not clear how to similarly generate random AFA or BRE. On the other 
hand, we do include a benchmark based on randomly generated LTL formulae, which we con- 
sider relatively close to realistic LTL specifications. 

7 https://github.com/lorisdanto/symbolicautomata/blob/master/benchmarks/sre/main/java/ 
regexconverter/pattern%4075.txt. 
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(2) Ni (10-11 {i— 1}0 [0-1] {n— 1}0 [0-1] {n—i}a;) | ( [0-1] {i—1}1 [0-1] {n— 
1}1[0-1]{n—i}a;) (exponential branching), 

(3) (jz, -* (.{pio+i}) +œ; (exponential paths 1), and 

(4) Ni -+ai0(.{pio+})+ (exponential paths 2), where aj,...,@, are disjoint 
character classes and p; is the j-th prime number. Another four are from [28]: 

(5) *.[01]*.1. [01] {n}.$\%.[01]*.0. [01] {n-1}.$ (sat. difference), 

(6) *.[01]*.1.1. [01] {n}.$\%*.[01]*.0. [01] {n+1}.$ (unsat. difference), 

(7) *.[01]*.1. [01] {n}.$N*. [01]*.0. [01] {n-1}.$ (sat. intersection) and 

(8) *.[01]*.1. [01] {n}.$N*.[01]*.0. [01] {n}.$ (unsat. intersection). For (1) 
we chose n € {50,100,...,500}, for (2)-(4) we chose n € {2,3,...,60} and for 
(5)-(8) we chose n € {50,100,..., 1000}. 


AFA Benchmark. The second group of examples contains AFA not easily convertible 
to BRE. Here we can run only tools that handle general AFA emptiness. Some of these 
benchmarks also have large sets of minterms (easily reaching to thousands) and com- 
plex formulae in the AFA transition function, hence converting them to restricted forms 
such such as separated DNF or explicit may be very costly. This also seems to be the 
main reason for which our implementation of [41] could not compete. 


a-ltlf-patterns comes from transformation of linear temporal logic formulae over finite 
traces (LTL) to AFA [34]. The 1699 formulae are from [60]® and they represent 
common LTL patterns which can be divided into two groups: (1) 7 parametric pat- 
terns (100 each) and (2) randomly generated conjunctions of simpler LTL+ patterns 
(999 formulae). 

a-Itl-rand contains 300 LTL; formulae obtained with the random generator of [77]. 
The generator traverses the syntactic tree of the LTL grammar, and is controlled by 
the number of variables, probabilities of connectives, maximum depth, and average 
depth. We have set the parameters empirically in a way likely to generate exam- 
ples difficult for the compared solvers (the formulae have 6 atomic propositions and 
maximum depth 16). 

a-Itl-param has a pair of hand-made parametric LTL; formulae (160 formulae each) 
used in [30,77]: Lift [43] describes a simple lift operating on a parametric number 
of floors and Counter [72] describes a counter incremented modulo the parameter. 

a-ltlf-spec [60] contains 62 LTL p formulae that specify realistic systems, used by Boe- 
ing [14] and NASA [42]. The formulae represent specifications used for designing 
Boeing AIR 6110 wheel-braking system and for designing NASA NextGen air traf- 
fic control (ATC) system. 

a-sloth 4062 AFA emptiness problems to which the string solver Sloth reduced string 
constraints [47]. The AFA have complex multi-track transitions encoding Boolean 
operations and transductions, and a special kind of synchronization of traces requir- 
ing complex initial and final conditions. 

a-noodler 13840 AFA emptiness problems that correspond to certain sub-problems 
solved within the string solver Noodler in [10]. The AFA were created similarly 
as those of a-sloth, but encode a different particular set of operations over different 
input automata. 


è https://drive.google.com/file/d/1eOY Gym3C8sQ-9iyfZ8qx42K54hgrFNTC. 
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5 The Comparison 


We ran our experiments on Debian GNU/Linux 11, with Intel Core 3.4 GHz processor, 
8 CPU cores, and 20 GB RAM. All experiments were run with the timeout of 60s 
(increasing the timeout did not have a significant impact). Additional details as well as 
the virtual machine with the entire benchmark are available at [2]. 


Benchmarking Infrastructure. The initial difficulty is that the tools expect different 
input formats and forms of automata and the benchmarks come in different formats as 
well. We converted all benchmarks to our internal AFA format, from which we gen- 
erated formats supported by the AFA handling tools JALTIMPACT, BWIC3, ANTISAT, 
and BISIM, or we extend the tools with a parser. The BRE benchmarks come from 
various sources. We first convert them into a master file which specifies the Boolean 
combination of atomic NFA, each atomic NFA stored in a separate file. The SMT- 
lib format is generated for Z3 and CVCS. In the case of b-hand-made, b-param, and 
b-smt, the atomic automata are translated from regular expressions using the parser 
of BRICS, while in the case of b-regex, where the regexes contain features not sup- 
ported by BRICS, we use the parser from BISIM. b-smt and b-hand-made requires first 
translating from SMT-lib to a regular expression. In the case of b-armc-incl, the atomic 
automata come directly as NFAs, and are converted into formats of the individual BRE 
solvers (we again wrote parsers for some of the solvers), and to our AFA format for the 
AFA solvers. Every BRE solver was extended by an interpreter of the master file that 
reads the NFA/DFA from the generated solver-specific files (except the SMT solvers, 
which read SMT-lib). We note that due to some difficulties with internal structures, we 
currently cannot run BRICS on b-armc-incl, and due to the lack of a converter from 
complex regular expressions and from pure NFA to the SMT format, we do not run Z3 
and CVCS5 on b-regex and on b-armc-incl. 


Measured Data. We will present the results obtained with BRE (where we run all the 
tools) and with AFA emptiness (where we run BWIC3, ANTISAT, BISIM, and JALTIM- 
PACT) separately. We also separate the results on examples from applications from 
results on parametric hand-made examples. 

Table 1 summarizes the statistics from evaluating the benchmarks. The table lists: (i) 
the average time, (ii) the median time, and (iii) the number of timeouts and number of 
errors (mostly, a tool ran out of the memory, made a bad alloc or ran into a segmentation 
fault). A few errors, e.g. in CVC5 or BISIM, were due to the unsupported features in the 
inputs. The tools’ performance is then visualised on cactus plots in Fig. 1. For each tool, 
the plot shows the progress of the tool on each benchmark: the y axis is the cumulative 
time taken on the benchmark, with the individual examples on the x axis ordered by the 
runtime taken by the tool. Timeouts are omitted. In the appendix, we also show a set of 
scatter-plots that compare for every benchmark the three best performing tools. 

Finally, we compared the tools on the parametric benchmarks a-ltl-param and b- 
param. We illustrate the results in Fig. 2. Each graph shows the times for the increasing 
value of the specific parameter on the x axis. 
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Table 1. Summary of AFA and BRE benchmarks. Table lists (i) the average, (ii) the median, and 
(iii) the number of timeouts and errors (in brackets). Winners are highlighted in bold. 


a-Itl-rand a-Itl-spec a-ItIf-patterns a-noodler a-sloth a-Itl-param 
(300) (62) (1 699) (13 840) (4062) (320) 
BwIC3 0.1 01 0 01 01 0 01 01 0 01 01 3°13 01 34 25.4 0.6 134 
BISIM 44 10 8 32.9 60.0 32 37.0 60.0 1013 31.6 26.4 6644(8) 17.5 1.5 1087(10) 58.2 60.0 308 
JALTIMPACT 7.9 2.3 12 24 14 00) 40 28 0 38 18 186 24.1 15.4 958 47.0 60.0 205 
ANTISAT 18.3 0.1 84 0.0 0.0 0 31.0 60.0 868 04 0.0 57 149 0.0 991 58.3 60.0 310 
b-arme-incl b-hand-made b-regex b-smt b-param 
(171) (56) (500) (330) (267) 
BwIC3 Sy lt 1 04 0.1 0 02 0.1 0 0.1 0.1 0 44.9 60.0 191 
BISIM 285 95 72 11.2 1.0 8 328 13 15 2.5 2.5 0 55.4 60.0 240 
BRICS - 3.9 0.4 3 58 08 40 0.3 0.3 0 52.7 60.0 228 
CcVvcs - 27.4 0.8 10(15) - 0.8 0.2 1 48.6 60.0 208 
AUTOMATA 3.5 04 9 0.2 0.2 o 02 02 0 0.2 0.2 0 46.3 60.0 161(42) 
JALTIMPACT 30.9 246 63 11.1 3.6 5 225 24 48 3.5 3.5 0 57.8 60.0 252 
ANTISAT 42.8 60.0 118 14 0.0 1 [Rosales 45 0.0 0.0 0 39.0 60.0 147 
MONA 28.5 44.1 43 27.3 0.1 22(3) 41.0 60.0 15(298) 1.5 0.0 8 44.9 60.0 25(169) 
ENFA 19 08 0 01 0.0 0 02 01 0 0.0 0.0 0 44.6 60.0 143(51) 
VATA 26 34 0O 01 0.0 0 21 02 100) 0.0 0.0 0 37.8 60.0  155(1) 
Z3 - 3.9 0.0 2 - 04 0.0 2 32.0 48.1 129 


5.1 Discussion 


Based on the measurements, we make several observations. 

Firstly, the tool which combines universality (it can be run on AFA as well as on 
BRE emptiness) with the most consistent good performance is BWIC3. It dominates 
most of the AFA emptiness benchmark, shows great or a very good performance on 
the BRE benchmark, and often stands out on the parametric examples. Moreover, the 
measurements reported in [28] suggest that the backward BTS reduction has even more 
potential. This is visible namely from the comparison of our results on the parametric 
benchmarks di -sat, di -unsat, inter-sat, and inter-unsat. Our implementation matched 
the result of [28] on di -sat and partially on inter-sat, saw a worse trend on di -unsat 
and much worse trend on inter-unsat. A likely culprit is a different underlying model- 
checker, ABC [17] in our implementation versus IC3Ref [16] in [28]. However, IC3Ref 
was not used out of the box in [28], harnessing it efficiently for problems of our king is 
not entirely trivial. 

Secondly, the results on application related BRE (all BRE except the parametric 
examples in b-param) quite surprisingly favour the tools based mostly on relatively 
basic NFA algorithms. The overall best is the simplest tool of all, our implementa- 
tion ENFA of basic NFA constructions. Close to the performance of ENFA is VATA, 
which uses the antichain inclusion checking on b-armc-incl and b-regex (the fact that 
explicit complementation of ENFA is faster than the antichain of VATA suggests that the 
inclusion benchmarks are not particularly hard). VATA specialises to the more general 
tree automata, which probably causes unnecessary overhead. AUTOMATA also performs 
well. It uses slightly more advanced algorithms than ENFA (such as lazy evaluation of 
difference, though, without antichain pruning). Its symbolic representation of transition 
functions with BDDs probably does not provide much advantage here. This result chal- 
lenges the view that translating complex problems, arising for instance in string con- 
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Fig. 1. Cactus plots of AFA and BRE benchmarks. The y axis is the cumulative time taken on the 
benchmark in logarithmic scale, benchmark on the x axis are ordered by the runtime of each tool. 


straint solving, into AFA in order to use the sophisticated machinery of AFA solvers 
is an obvious silver bullet. Organizing the computation into smaller NFA operations, 
where, moreover, partial results can be minimized and re-used, and a simpler and hence 
more flexible NFA technology is used, might be a better strategy (this seems to work 
very well for instance in our recent prototype string constraint solver [10]). 

Our AFA emptiness test ANTISAT based on the antichain algorithm and a SAT 
solver has an interesting performance. As can be seen on the cactus plots, besides its 
absolute domination on a-ltlf-spec, it is significantly faster than other tools on a large 
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Fig. 2. Models of runtime on parametric benchmarks based on specific parameter k with timeout 
60s. The sawtooths represent the tool failed on the benchmark for some k while solving bench- 
marks for k — 1 and k +1. For brevity, we draw the models only until they start continually failing. 


portion of the other AFA emptiness benchmark, but struggles on the rest. The exam- 
ples where it dominates are often automata with the structure resembling a lasso (or 
several lassos) with a long handle. The other implementation of an antichain algorithm, 
NFA/NTA inclusion in VATA, also shows a good performance. This together points on 
the overall strength of antichain algorithms. 

The SMT string constraint solvers are not among the best in the benchmark related 
to practical applications, but are competitive (especially Z3), and win on some paramet- 
ric cases. This may be due to that various heuristics unique to SMT solvers, especially 
rewriting that reduces one type of a constraint to another, kicks in. For instance, Z3 
seems to solve exppaths1 with a help of rewriting to the sub-string constraint in the 
theory of sequences. In general, the measurements on parametric examples underscore 
the fact that no algorithm is universally the best and their relative performance may vary 
drastically depending on the kind of an input. 

Although the mediocre performance of the other tools can be partially explained by 
their focus on a different kind of a problem or a dated underlying technology, and each 
of them is respectable in its own right, a point can be made against relying on them 
as a baseline in comparisons of tools for solving our kind of problem. MONA, opti- 
mized for a different settings (complex alphabets of bit-vectors with many minterms), 
is held back by the implicit determinization, and, in our case, probably by the over- 
head of the symbolic representation. It also frequently runs out of the 32-bit address 
space for BDD nodes. Similarly for BRICS, which also always determinizes. The low 
performance of BISIM is surprising relative to the good results of the up-to algorithms 
reported in [11,30]. It is more consistent with [39] where up-to algorithms were not 
wining against antichains on the more practical examples. Our results however do not 
directly contradict the results of [30] itself, since it does not compare with the fast tools 
identified here and stands to a large degree on parametric and random benchmarks. 
There is also always the possibility that we have prepared the input in a way not ideal 
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for the tool. For instance, transformation to the separated AFA, required by BISIM, 
is not entirely trivial. Further investigation of this and a comparison with some other 
implementation of the up-to techniques seems to be needed. The lack of a raw speed 
of JALTIMPACT on BRE and AFA emptiness is expectable considering that it is meant 
for a different kind of systems, AFA over data words. The stable trends shown in the 
graphs suggest that an implementation of an interpolation-based abstraction refinement 
optimized for BRE and AFA emptiness might have a potential. 


Main Takeaways. The backward reduction of AFA emptiness to BTS reachability in a 
combination with IC3 is very fast and extremely versatile, showing very good perfor- 
mance on almost all benchmarks. However, on BRE with a relation to a real world appli- 
cation, simple NFA algorithms actually tend to have the best raw performance, with the 
simplest implementation of NFA being the best. Antichain algorithms work also well, 
even significantly better than other algorithms on specific kinds of AFA. These seem to 
be the tools to use. Reasonable implementations of the backward BTS reduction with 
IC3, of antichain, and of basic NFA should also be the baseline of comparisons. 

MONA and BRICS, based on DFA, as well as JALTIMPACT focused on data words 
rather then on pure regular properties, do no reach the performance of the best tools. 
Also BISIM did not confirm the power of up-to algorithms. SMT-solvers, Z3 especially, 
are competitive, but cannot be considered the top of state of the art. 

Generally, the particular kind and source of benchmark is a decisive factor influenc- 
ing the performance of tools, as especially visible on the parametric benchmark. 


Threads to Validity. Our results must be taken with a grain of salt as the experiment 
contains an inherent room for error. Although we tried to be as fair as possible, not 
knowing every tool intimately, the conversions between formats and kinds of automata, 
discussed at the start of Sect. 5, might have introduced biases into the experiment. Tools 
are written in different languages and some have parameters which we might have used 
in sub-optimal way (we use the tools in their default settings), or, in the case of libraries, 
we could have used a sub-optimal combination of functions. We also did not measure 
memory peaks, which could be especially interesting e.g. in when the tools are deployed 
on a cloud. We are, however, confident that our main conclusions are well justified 
and the experiment gives a good overall picture. The entire experiment is available for 
anyone to challenge or improve upon [2]. 
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Abstract. We present an automated reasoning framework for synthe- 
sizing recursion-free programs using saturation-based theorem proving. 
Given a functional specification encoded as a first-order logical formula, 
we use a first-order theorem prover to both establish validity of this for- 
mula and discover program fragments satisfying the specification. As a 
result, when deriving a proof of program correctness, we also synthesize 
a program that is correct with respect to the given specification. We 
describe properties of the calculus that a saturation-based prover capa- 
ble of synthesis should employ, and extend the superposition calculus in 
a corresponding way. We implemented our work in the first-order prover 
VAMPIRE, extending the successful applicability of first-order proving to 
program synthesis. 


Keywords: Program Synthesis - Saturation - Superposition - 
Theorem Proving 


1 Introduction 


Program synthesis constructs code from a given specification. In this work we 
focus on synthesis using functional specifications summarized by valid first-order 
formulas [1,14], ensuring that our programs are provably correct. While being a 
powerful alternative to formal verification [20], synthesis faces intrinsic compu- 
tational challenges. One of these challenges is posed to the reasoning backend 
used for handling program specifications, as the latter typically include first- 
order quantifier alternations and interpreted theory symbols. As such, efficient 
reasoning with both theories and quantifiers is imperative for any effort towards 
program synthesis. 

In this paper we address this demand for recursion-free programs. We advo- 
cate the use of first-order theorem proving for extracting code from correctness 
proofs of functional specifications given as first-order formulas Vz%.Jy.F'|Z, y]. 
These formulas state that “for all (program) inputs 7 there exists an output 
y such that the input-output relation (program computation) F'[Z, y] is valid”. 


© The Author(s) 2023 
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Given such a specification, we synthesize a recursion-free program while also 
deriving a proof certifying that the program satisfies the specification. 

The programs we synthesize are built using first-order theory terms extended 
with if—then—else constructors. To ensure that our programs yield computa- 
tional models, i.e., that they can be evaluated for given values of input variables 
T, we restrict the programs we synthesize to only contain computable symbols. 


Our Approach in a Nutshell. In order to synthesize a recursion-free pro- 
gram, we prove its functional specification using saturation-based theorem prov- 
ing [11,15]. We extend saturation-based proof search with answer literals [5], 
allowing us to track substitutions into the output variable y of the specification. 
These substitutions correspond to the sought program fragments and are condi- 
tioned on clauses they are associated with in the proof. When we derive a clause 
corresponding to a program branch if C then r, where C is a condition and 
r aterm and both C,r are computable, we store it and continue proof search 
assuming that ~C holds; we refer to such conditions C as (program) branch 
conditions. The saturation process for both proof search and code construction 
terminates when the conjunction of negations of the collected branch conditions 
becomes unsatisfiable. Then we synthesize the final program satisfying the given 
(and proved) specification by assembling the recorded program branches (see 
e.g. Examples 1-3). 

The main challenges of making our approach effective come with (i) inte- 
grating the construction of the programs with if —then—else into the proof 
search, turning thus proof search into program search/synthesis, and (ii) guiding 
program synthesis to only computable branch conditions and programs. 


Contributions. We bring the next contributions solving the above challenges:! 


e We formalize the semantics for clauses with answer literals and introduce a 
saturation-based algorithm for program synthesis based on this semantics. We 
prove that, given a sound inference system, our saturation algorithm derives 
correct and computable programs (Sect. 4). 

e We define properties of a sound inference calculus in order to make the cal- 
culus suitable for our saturation-based algorithm for program synthesis. We 
accordingly extend the superposition calculus and define a class of substitu- 
tions to be used within the extended calculus; we refer to these substitutions 
as computable unifiers (Sect. 5). 

e We extend a first-order unification algorithm to find computable unifiers 
(Sect. 6) to be further used in saturation-based program synthesis. 

e We implement our work in the VAMPIRE prover [11] and evaluate our synthesis 
approach on a number of examples, complementing other techniques in the 
area (Sect. 7). For example, our results demonstrate the applicability of our 
work on synthesizing programs for specifications that cannot be even encoded 
in the SyGuS syntax [16]. 


Proofs of our results are given in the extended version [8] of our paper. 
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2 Preliminaries 


We assume familiarity with standard multi-sorted first-order logic with equality. 
We denote variables by x,y, terms by s,t, atoms by A, literals by L, clauses by 
C, D, formulas by FG, all possibly with indices. Further, we write ø for Skolem 
constants. We reserve the symbol O for the empty clause which is logically 
equivalent to L. Formulas and clauses with free variables are considered implic- 
itly universally quantified (i.e. we consider closed formulas). By ~ we denote 
the equality predicate and write t # s as a shorthand for =at ~ s. We use a 
distinguished integer sort, denoted by Z. When we use standard integer pred- 
icates <, <, >, >, functions +,—,... and constants 0,1,..., we assume that 
they denote the corresponding interpreted integer predicates and functions with 
their standard interpretations. Additionally, we include a conditional term con- 
structor if—then—else in the language, as follows: given a formula F and terms 
s,t of the same sort, we write if F then s else t to denote the term s if F is 
valid and t otherwise. 

An expression is a term, literal, clause or formula. We write Eft] to denote 
that the expression E contains the term t. For simplicity, E[s] denotes the expres- 
sion E where all occurrences of t are replaced by the term s. A substitution 0 is a 
mapping from variables to terms. A substitution 0 is a unifier of two expressions 
E and E’ if E0 = E’6, and is a most general unifier (mgu) if for every unifier 7 
of E and F’, there exists substitution u such that 7 = 0u. We denote the mgu 
of E and F’ with mgu( E, E’). We write Fi,...,F, Gi,...,Gm to denote that 
FiA... A Fy > GiV...VGm is valid, and extend the notation also to validity 
modulo a theory T. Symbols occurring in a theory T are interpreted and all 
other symbols are uninterpreted. 


2.1 Computable Symbols and Programs 


We distinguish between computable and uncomputable symbols in the signature. 
The set of computable symbols is given as part of the specification. Intuitively, 
a symbol is computable if it can be evaluated and hence is allowed to occur 
in a synthesized program. A term or a literal is computable if all symbols it 
contains are computable. A symbol, term or literal is uncomputable if it is not 
computable. 

A functional specification, or simply just a specification, is a formula 


Va.dy.F |z, y]. (1) 


The variables 7 of a specification (1) are called input variables. Note that while 
we use specifications with a single variable y, our work can analogously be used 
with a tuple of variables y in (1). 

Let g denote a tuple of Skolem constants. Consider a computable term r[a] 
such that the instance Fa, r[a]] of (1) holds. Since @ are fresh Skolem constants, 
the formula YT.F |z, r[Z]] also holds; we call such r[Z] a program for (1) and say 
that the program r[%] computes a witness of (1). 
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Superposition: 


sxtVC Lis] yvy setveO uj] guv setvC uļls’] xu ve 


(Lit] v C v C0 (uft] # u’ v C v C")0 (ult] ~ u’ v C v C"')0 


where 0 := mgu(s, s'); t0 # s0; (first rule only) L[s’] is not an equality literal; and 
(second and third rules only) u’é 7 ufs]. 


Binary resolution: Factoring: Equality resolution: Equality factoring: 


AVC -AVC AVA'VC s#tvC svtvs' stvC 
(C v C')é (Av C)0 Cé (sxtvtizetvCje 
where where where 0 := mgu(s,t). where 0 := mgu(s, s’); 
6 := mgu(A, A’). 6:=mgu(A, A’. t0 x s0; and t'0 ¥ t0. 


Fig. 1. The superposition calculus Sup. 


Further, if Vz.(Fi A... A Fa — F[z,r|z]]) holds for computable formulas 
F,,...,F,, we write (r[Z], N; Fi) to refer to a program with conditions F\,..., 
F,, for (1). In the sequel, we refer to (parts of) programs with conditions also 
as conditional branches. In Sect. 4 we show how to build programs for (1) by 
composing programs with conditions for (1) (see Corollary 3). 


2.2 Saturation and Superposition 


Saturation-based proof search implements proving by refutation [11]: to prove 
validity of F, a saturation algorithm establishes unsatisfiability of ~F. First- 
order theorem provers work with clauses, rather than with arbitrary formulas. To 
prove a formula F, first-order provers negate F which is further skolemized and 
converted to clausal normal form (CNF). The CNF of =F is denoted by cnf (~F) 
and represents a set S' of initial clauses. First-order provers then saturate S by 
computing logical consequences of S with respect to a sound inference system 
T. The saturated set of S is called the closure of S and the process of computing 
the closure of S is called saturation. If the closure of S contains the empty clause 
, the original set S of clauses is unsatisfiable, and hence the formula F is valid. 

We may extend the set S of initial clauses with additional clauses C4, ..., Cn- 
If C is derived by saturating this extended set, we say C is derived from S under 
additional assumptions C,...,Cn. 

The superposition calculus, denoted as Sup and given in Fig. 1, is the most 
common inference system used by saturation-based provers for first-order logic 
with equality [15]. The Sup calculus is parametrized by a simplification ordering 
> on terms and a selection function, which selects in each non-empty clause a 
non-empty subset of literals (possibly also positive literals). We denote selected 
literals by underlining them. An inference rule can be applied on the given 
premise(s) if the literals that are underlined in the rule are also selected in the 
premise(s). For a certain class of selection functions, the superposition calculus 
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Sup is sound (if O is derived from F, then F is unsatisfiable) and refutationally 
complete (if F is unsatisfiable, then O can be derived from it). 


2.3 Answer Literals 


Answer literals [5] provide a question answering technique for tracking substi- 
tutions into given variables throughout the proof. Suppose we want to find a 
witness for the validity of the formula 


dy.F ly]. (2) 


Within saturation-based proving, we first derive the skolemized negation of (2) 
and add an answer literal using a fresh predicate ans with argument y, yielding 


Vy.(>Fly]Vans(y)). (3) 


We then saturate the CNF of (3), while ensuring that answer literals are not 
selected for performing inferences. If the clause ans(t1)V ...Vans(tm) is derived 
during saturation, note that this clause contains only answer literals in addition 
to the empty clause; hence, in this case we proved unsatisfiability of Vy..F[y], 
implying validity of (2). Moreover, t1, ...,tm provides a disjuntive answer, i.e. 
witness, for the validity of (2); that is, F[t1]V . . . VF [tm] holds [12]. In particular, 
if we derive the clause ans(t) during saturation, we found a definite answer t 
for (2), namely Ft] is valid. 


Answer Literals with if—then—else. The derivation of disjunctive answers 
can be avoided by modifying the inference rules to only derive clauses containing 
at most one answer literal. One such modification is given within the A(R)- 
calculus for binary resolution [22], where R is a so-called strongly liftable term 
restriction. The A()-calculus replaces the binary resolution rule when both 
premises contain an answer literal by the following A-resolution rule: 


AVCVans(r) -=A’VC’Vans(r’) 
(CVC'Vans(if A then r’ else r))0 


(A-resolution), 


where 0 := mgu(A, A’) and the restriction R(if A then r’ else r) holds. 

In our work we go beyond the A-resolution rule and modify both the super- 
position calculus and the saturation algorithm to reason not only about answer 
literals but also about their use of if —then—else terms (see Sects. 4-5). 


3 Illustrative Example 


Let us illustrate our approach to program synthesis. We use answer literals in sat- 
uration to construct programs with conditions while proving specifications (1). 
By adding an answer literal to the skolemized negation of (1), we obtain 


Vy.(-F |F, y|Vans(y)), 
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YVx.i(x)xsr ~e (Al) Va.exx~eaxr (A2) Va,y,z.0%(y*z) ~(axy)*z (A3) 


Fig. 2. Axioms defining a group. Uninterpreted function symbols i(-), e, * represent the 
inverse, the identity element, and the group operation, respectively. 


where g are the skolemized input variables x. When we derive a unit clause 
ans(r[o]) during saturation, where r[a] is a computable term, we construct a 
program for (1) from the definite answer r[a] by replacing o with the input 
variables 7, obtaining the program r[z]. Hence, deriving computable definite 
answers by saturation allows us to synthesize programs for specifications. 


Example 1. Consider the group theory axioms (A1)—(A3) of Fig. 2. We are inter- 
ested in synthesizing a program for the following specification: 


Va.dy.cxyre (4) 


In this example we assume that all symbols are computable. To synthesize a 
program for (4), we add an answer literal to the skolemized negation of (4) and 
convert the resulting formula to CNF (preprocessing). We consider the set S' of 
clauses containing the obtained CNF and the axioms (A1)-(A3). We saturate S 
using Sup and obtain the following derivation:? 


1. oxy # eVans(y) [preprocessed specification] 
2. i(x)x (zxy) rexy [Sup Al, A3] 
3. i(a) x (rxy) xy [Sup A2, 2.] 
4. rxy rili(x)) xy [Sup 3., 3.] 
5. e ~ xx ils) [Sup 4., A1] 
6. ans(i(c)) [BR 5., 1.] 


Using the above derivation, we construct a program for the functional specifi- 
cation (4) as follows: we replace ø in the definite answer i(c) by x, yielding the 
program i(x). Note that for each input x, our synthesized program computes the 
inverse i(a) of x as an output. In other words, our synthesized program for (4) 
ensures that each group element x has a right inverse i(z). 


While Example 1 yields a definite answer within saturation-based proof 
search, our work supports the synthesis of more complex recursion-free pro- 
grams (see Examples 2-3) by composing program fragments derived in the pro- 
gram search (Sect. 4) as well as by using answer literals with if—then—else to 
effectively handle disjunctive answers (Sect. 5). 


? For each formula in the derivation, we also list how the formula has been derived. For 
example, formula 5 is the result of superposition (Sup) with formula 4 and axiom A1, 
whereas binary resolution (BR) has been used to derive formula 6 from 5 and 1. 
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4 Program Synthesis with Answer Literals 


We now introduce our approach to saturation-based program synthesis using 
answer literals (Algorithm 1). We focus on recursion-free program synthesis and 
present our work in a more general setting. Namely, we consider functional spec- 
ifications whose validity may depend on additional assumptions (e.g. additional 


program requirements) A;,...,An, where each A; is a closed formula: 
Ai A... A An > YT.3y.F |T, y] (5) 
Note that specification (1) is a special case of (5). However, since A1, ..., An are 


closed formulas, (5) is equivalent to YT.3y.(Aı A... A An — F[z,y]), which is a 
special case of (1). 

Given a functional specification (5), we use answer literals to synthesize pro- 
grams with conditions (Sect. 4.1) and extend saturation-based proof search to 
reason about answer literals (Sect. 4.2). For doing so, we add the answer literal 
ans(y) to the skolemized negation of (5) and obtain 


Ai A... A An A Yy. F|, y]Vans(y)). (6) 


We saturate the CNF of (6), while ensuring that answer literals are not selected 
within the inference rules used in saturation. We guide saturation-based proof 
search to derive clauses C|o]Vans(r|o]), where C[a] and r[a] are computable. 


4.1 From Answer Literals to Programs 


Our next result ensures that, if we derive the clause C[a]Vans(r[a]), the term r[a] 
is a definite answer under the assumption =C [a] (Theorem 1). We note that we do 
not terminate saturation-based program synthesis once a clause C[a]Vans(r[a]) 
is derived. We rather record the program r[z] with condition =C [z] (and possibly 
also other conditions), replace clause C[a]Vans(r[@]) by Cia], and continue satu- 
ration (Corollary 2). As a result, upon establishing validity of (5), we synthesized 
a program for (5) (Corollary 3). 


Theorem 1 [Semantics of Clauses with Answer Literals]. Let C be a 
clause not containing an answer literal. Assume that, using a saturation algo- 
rithm based on a sound inference system T, the clause CVans(r|a]) is derived 
from the set of clauses consisting of initial assumptions A,,..., An, the clausified 
formula cnf (~F Ia, y|Vans(y)) and additional assumptions C),...,Cm. Then, 


A1,- .-, An, Ci,- --, Cm F C, F[o, rio]. 


That is, under the assumptions Ci,...,Cm, =C, the computable term r{@] pro- 
vides a definite answer to (5). 


We further use Theorem 1 to synthesize programs with conditions for (5). 


314 P. Hozzova et al. 


Corollary 2 [Programs with Conditions]. Let rja] be a computable term 
and Cla] a ground computable clause not containing an answer literal. Assume 
that clause C[g]Vans(r[G]) is derived from the set of initial clauses A,,...,An, 
the clausified formula cnf (~F |T, y]Vans(y)) and additional ground computable 
assumptions Cy[G],...,Cm[e], by using saturation based on a sound inference 
system T. Then, 


is a program with conditions for (5). 


Note that a program with conditions we Ajz1 C;[@] A =C|T]) corresponds 
to a conditional (program) branch if NaC jæ] A -C[Z] then r[z]: only if the 
condition Aj- C;{z] A -C[z] is valid, then r[z] is computed for (5). 

We use programs with conditions (r[z], A7 j=1 Cjl] A 7C[Z)) to finally synthe- 
size a program for (5). To this end, we use Corollary 2 to derive programs with 
conditions, and once their conditions cover all possible cases given the initial 
assumptions A;,..., An, we compose them into a program for (5). 


Corollary 3 [From Programs with Conditions to Programs for (5)]. 
Let P,[Z],...,P,[%], where P;[Z] = (r.[Z], ae r C;[Z] A —C;[T]}), be programs with 
conditions for (5), such that \j_, Ai A me Cilz] is unsatisfiable. Then P[Z], 
given by 
P{Z] := if =C [T] then r,[Z] 
else if —C2[T] then r2[Z] 


else if =C,_1[Z] then r;,_1[Z] 


else r;,[Z], 
is a program for (5). 


Note that since the conditional branches of (7) cover all possible cases to be 
considered over T, we do not need the condition if ~C. In particular, if k = 1, 
ie. A; Ai ^C [7] is unsatisfiable, then the synthesized program for (5) is rı [z]. 


4.2 Saturation-Based Program Synthesis 


Our program synthesis results from Theorem 1, Corollary 2 and Corollary 3 
rely upon a saturation algorithm using a sound (but not necessarily complete) 
inference system Z. In this section, we present our modifications to extend state- 
of-the-art saturation algorithms with answer literal reasoning, allowing to derive 
clauses C[a]Vans(r|a]), where both C[a] and r[a] are computable. In Sects. 5-6 
we then describe modifications of the inference system Z to implement rules over 
clauses with answer literals. 
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Algorithm 1. Saturation Loop for Recursion-Free Program Synthesis 


1 initial set of clauses S := {cnf(A1 A... A An A Vy.(4F[@, y] V ans(y)))} 

2 initial sets of additional assumptions C := Ø and programs P := 0 

3 repeat 

4 Select clause G € S 

5 Derive consequences C1, ...,Cn of G and formulas from S$ using rules of T 

6 for each C; do 

7 if C; = (C[F] V ans(r[o])) and C[F] is ground and computable then 

8 P := P U {(riē], Acree C’ AACE) } /* Corollary 2 */ 
9 C :=CU{C{z]} 

10 Ci := Cfo] 

11 S:=SU{Ci,...,Cn} 

12 if O € S then 

13 return program (7) for specification (5), derived from P. /* Corollary 3 */ 


Our saturation algorithm is given in Algorithm 1. In a nutshell, we use Corol- 
lary 2 to construct programs from clauses C|o]Vans(r|o]) and replace clauses 
Cl@|Vans(r[a]) by Co] (lines 7-10 of Algorithm 1). The newly added com- 
putable assumptions C[a] are used to guide saturation towards deriving pro- 
grams with conditions where the conditions contain C[Z]; these programs with 
conditions are used for synthesizing programs for (5), as given in Corollary 3. 

Compared to a standard saturation algorithm used in first-order theorem 
proving (e.g. lines 4-5 of Algorithm 1), Algorithm 1 implements additional steps 
for processing newly derived clauses C[a]Vans(r[a]) with answer literals (lines 
6-10). As a result, Algorithm 1 establishes not only the validity of the specifica- 
tion (5) but also synthesizes a program (lines 12-13). Throughout the algorithm, 
we maintain a set P of programs with conditions derived so far and a set C of 
additional assumptions. For each new clause C;, we check if it is in the form 
Cle@|Vans(r[a]) where C[o] is ground and computable (line 7). If yes, we con- 
struct a program with conditions (r[z], Acree C” A 7C[Z]), extend C with the 
additional assumption C[Z], and replace C; by C[a] (lines 8-10). Then, when 
we derive the empty clause, we construct the final program as follows. We first 
collect all clauses that participated in the derivation of O. We use this clause 
collection to filter the programs in P — we only keep a program originating from 
a clause C[a]Vans(r[o]) if the condition C[a] was used in the proof, obtaining 
programs P,,...,P,. From P,,...,P, we then synthesize the final program P 
using the construction (7) from Corollary 3. 


Remark 1. Compared to [22] where potentially large programs (with conditions) 
are tracked in answer literals, Algorithm 1 removes answer literals from clauses 
and constructs the final program only after saturation found a refutation of the 
negated (5). Our approach has two advantages: first, we do not have to keep 
track of potentially many large terms using if—then—else, which might slow 
down saturation-based program synthesis. Second, our work can naturally be 
integrated with clause splitting techniques within saturation (see Sect. 7). 


316 P. Hozzova et al. 


5 Superposition with Answer Literals 


We note that our saturation-based program synthesis approach is not restricted 
to a specific calculus. Algorithm 1 can thus be used with any sound set of 
inference rules, including theory-specific inference rules, e.g. [10], as long as the 
rules allow derivation of clauses in the form C’Vans(r), where C, r are computable 
and C is ground. Le., the rules should only derive clauses with at most one answer 
literal, and should not introduce uncomputable symbols into answer literals. 

In this section we present changes tailored to the superposition calculus Sup, 
yet, without changing the underlying saturation process of Algorithm 1. We first 
introduce the notion of an abstract unifier [17] and define a computable unifier — 
a mechanism for dealing with the uncomputable symbols in the reasoning instead 
of introducing them into the programs. The use of such a unifier in any sound 
calculus is explained, with particular focus on the Sup calculus. 


Definition 1 (Abstract unifier [17]). An abstract unifier of two expressions 
E, EF, is a pair (0, D) such that: 


1. 6 is a substitution and D is a (possibly empty) disjunction of disequalities, 
2. (DVE: ~ E2)0 is valid in the underlying theory. 


Intuitively speaking, an abstract unifier combines disequality constraints D with 
a substitution 0 such that the substitution is a unifier of E1, E> if the constraints 
D are not satisfied. 


Definition 2 (Computable unifier). A computable unifier of two expressions 
E, E with respect to an expression E3 is an abstract unifier (0, D) of E1, E2 
such that the expression E30 is computable. 


For example, let f be computable and g uncomputable. Then ({y +> f(z)}, 
z £% g(x)) is a computable unifier of the terms f(g(x)),y with respect to f(y). 
Further, ({y +> f(g(x))},9) is an abstract unifier of the same terms, but not a 
computable unifier with respect to f(y). 
Ensuring Computability of Answer Literal Arguments. We modify the 
rules of a sound inference system Z to use computable unifiers with respect to 
the answer literal argument instead of unifiers. Since a computable unifier may 
entail disequality constraints D, we add D to the conclusions of the inference 
rules. That is, for an inference rule of Z as below 
Ci we Ce 
Cé (8) 
where @ is a substitution such that ŁO ~ E’@ holds for some expressions E, E’, 


we extend Z with the following n inference rules with computable unifiers: 


CiVans(r) C2 >> Cy Cy Cy +++ C,Vans(r) 
(DVCVans(r))6’ ee (DVCVans(r))6’ (9) 


where (6’,D) is a computable unifier of E, E’ with respect to r and none of 
C1, ...,Cn contains an answer literal. We obtain the following result. 
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Superposition (Sup): 


sx2tVCVans(r) L[s’] VC’ V ans(r’) s2tVCVans(r) L[s’] VC’ V ans(r’) 


(DV Lt] VC v C V ans(if sœt then r’ else r))0 (DV rer’ v Lt] v Cv C v ans(r))6 
s~tVCVans(r) u[s’] # u’ VC’ V ans(r') s~tVCVans(r) ul[s’] ~u’ VC’ Vans(r’) 


(Dv ult] #u' VCVC' Vans(if s&tthenr’ elser))0 (DV rer‘ Vult] 2u! VCVC' Vans(r))6 
s2tVCVans(r) uls’] ~ u’ VC’ Vans(r’) sxetVCVans(r) ul[s’]#u’ VC’ Vans(r’) 


(DV ult] yu’ VCVO' Vans(if setthenr’ else r))0 (Dv r¢zr'Vult] Zu! VCVOC' Vans(r))0 


where (0, D) is a computable unifier of s,s’ w.r.t. the argument of the answer 
literal in the rule conclusion (i.e. if sœt then r’ else r for the left-column rules, 
and r for the others); (rules on the first line only) L[s’] is not an equality literal; 
and (rules on the second and third line only) u’@ ¥ ul[s’]0. 


Binary resolution (BR): 


AVCVans(r) ~A V CV ans(r’) AVCVans(r) ~A VCV ans(r’) 
(DVCVC'Vans(if A then r’ else r))6 (DyvræÆr' vC vV CV ans(r))ð 


where (0, D) is a computable unifier of A, A’ w.r.t. (first rule) if A then r’ else r 
or (second rule) r. 


Factoring (F): Equality resolution (ER): Equality factoring (EF): 


AV A'VCVans(r) s#tVCVans(r) sxtvs' et’ VCVans(r) 
(Dv AVCVans(r))0 (DV CV ans(r))0 (Dvs~tvtÆtť VCVans(r))0 
where (8, D) is a where (0, D) is a where (0, D) is a computable 
computable unifier computable unifier unifier of s,s’ w.r.t. r; 
of A, A’ w.r.t. r. of s,t w.r.t. r. tð Z s0; and t'O ¥ td. 


Fig. 3. Selected rules of the extended superposition calculus Sup for reasoning with 
answer literals, with underlined literals being selected. 


Lemma 4 [Soundness of Inferences with Answer Literals]. If the rule (8) 
is sound, the rules (9) are sound as well. 


We note that we keep the original rule (8) in Z, but impose that none of its 
premises C1, ...,Cn contains an answer literal. Clearly, neither the such modified 
rule (8) nor the new rules (9) introduce uncomputable symbols into answer 
literals. Rather, these rules add disequality constraints D into their conclusions 
and immediately select D for further applications of inference rules. Such a 
selection guides the saturation process in Algorithm 1 to first discharge the 
constraints D containing uncomputable symbols with the aim of deriving a clause 
C’Vans(r’) where C’ is computable. The clause C’Vans(r’) is then converted into 
a program with conditions using Corollary 2. 


Superposition with Answer Literals. We make the inference rule modifica- 
tions (8), together with the addition of new rules (9), for each inference rule of 
the Sup calculus from Fig. 1. Further, we also ensure that rules with multiple 
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premises, when applied on several premises containing answer literals, derive 
clauses with at most one answer literal. We therefore introduce the following 
two rule modifications. (i) We use the if —then—else constructor to combine 
answer literals of premises, by adapting the use of if—then—else within binary 
resolution [13,14,22] to superposition rules. (ii) We use an answer literal from 
only one of the rule premises in the rule conclusion and add new disequality 
constraint r % r’ between the premises’ answer literal arguments, similar to the 
constraints D of the computable unifier. Analogously to the computable unifier 
constraints, we immediately select this disequality constraint r % r’. 

The resulting extension of the Sup calculus with answer literals is given in 
Fig. 3. In addition to the rules of Fig.3, the extended calculus contains rules 
constructed as (9) for superposition and binary resolution rules of Fig. 1. Using 
Lemma 4, we conclude the following. 


Lemma 5 [Soundness of Sup with Answer Literals]. The inference rules 
from Fig. 3 of the extended Sup calculus with answer literals are sound. 


By the soundness results of Lemmas 4-5, Corollaries 2-3 imply that, when 
applying the calculus of Fig.3 in the saturation-based program synthesis app- 
roach of Algorithm 1, we construct correct programs. 


Example 2. We illustrate the use of Algorithm 1 with the extended Sup calcu- 
lus of Fig. 3, strengthening our motivation from Sect. 3 with if —then—else 
reasoning. To this end, consider the functional specification over group theory: 


Vr, y.-3z.(rxy g yxr—>z*z £e), (10) 


asserting that, if the group is not commutative, there is an element whose square 
is not e. In addition to the axioms (A1)-(A3) of Fig. 2, we also use the right 
identity axiom (A2’) Vx. x *e ~ x.” Based on Algorithm 1, we obtain the 
following derivation of the program for (10): 


1. o1 * 02 £ 02 * 01 V ans(z) [preprocessed specification 

2. e ~ zx zV ans(z) [preprocessed specification 

3. 01 * 02 É O2 * 01 [answer literal removal 1. (Algorithm 1, line 10) 

4. xx (xx y) ~exyVans(z) [Sup 2., A3 

5. e œ q x (y x* (xx y))Vans(z x y) [Sup A3, 2. 

6. xx*(xx*y) ~ yVans(z) [Sup 4., A2 

7. cxe ~ yx*(xxy)Vans(if ex xx(y*(x*y)) then z else zxy) [Sup 6., 5. 

8. y*(xxy) ~ xrVans(if e ~ zxr x (yx (xx*y)) then z else z x y) [Sup 7., A? 

9. zxy ~yx*xaeVans(if cx (y*x) ~ y then z else if e ~ x x (y * (a * 

y)) then z else z * y) [Sup 6., 8. 

10. ans(if cı * (02 * 01) ~ og then cı else if e ~ o * (Gq * (oy * o2)) 
then gı else a * 02) [BR 9., 3. 

11. [answer literal removal 11. (Algorithm 1, line 10) 


3 We include axiom (A2’) only to shorten the presentation of the obtained derivation. 
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The programs with conditions collected during saturation-based program syn- 
thesis, in particular corresponding to steps 3. and 11. above, are: 


Pilz, y] := (2,0*y ~ y * x) 
Pa{x,y] := (if x» (y* x) ~ y then z else (if e ~ zx x (y » (x x» y)) then x else x * y), 
axy £y* x) 
Note the variable z, representing an arbitrary witness, in P,[x,y]. An arbitrary 
value is a correct witness in case rx y œ~ y*x holds, as in this case (10) is trivially 


satisfied. Thus, we do not need to consider the case rxy ~ y*xx separately. Hence, 
we construct the final program P|æ, y] only from P2{x,y] and obtain: 


Plax,y| := if wx(yxx)~a then x else (if exax(yx(xxy)) then x else xxy) 
We conclude this section by illustrating the benefits of computable unifiers. 

Example 3. Consider the group theory specification 
Va, y-dz. z x (i(x) *i(y)) =e, (11) 


describing the inverse element z of i(x) x i(y). We annotate the inverse i(-) as 
uncomputable to disallow the trivial solution 7(i(x) * i(y)). Using computable 
unifiers, we synthesize the program y « x; that is, a program computing y * x as 
the inverse of i(x) x i(y). 


6 Computable Unification with Abstraction 


When compared to the Sup calculus of Fig. 1, our extended Sup calculus with 
answer literals from Fig.3 uses computable unifiers instead of mgus. To find 
computable unifiers, we introduce Algorithm 2 by extending a standard unifica- 
tion algorithm [7,18] and an algorithm for unification with abstraction of [17]. 
Algorithm 2 combines computable unifiers with mgu computation, resulting in 
the computable unifier 0 := mguconp( E1, £2, Æ3) to be further used in Fig. 3. 

Algorithm 2 modifies a standard unification algorithm to ensure computabil 
ity of E30. Changes compared to a standard unification algorithm are high- 
lighted. Algorithm 2 does not add s + t to 0 if s is a variable in E3 and t is 
uncomputable. Instead, if t is f(ti,...,tn) where f is computable but not all 
ty,...,t, are computable, we extend 0 by s > f(a1,...,%n) and then add equa- 
tions 7, = t1,..., £n = tn to the set of equations E£ to be processed. Otherwise, 
f is uncomputable and we perform an abstraction: we consider s and t to be uni- 
fied under the condition that s œ t holds. Therefore we add a constraint s % t to 
the set of literals D which will be added to any clause invoking the computable 
unifier. To discharge the literal s % t, one must prove s ~ t. While s can be later 
substituted for other terms, as long as we use mgUcomp, 5$ Will never be substituted 
for an uncomputable term. Thus, we conclude the following result. 


Theorem 6. Let E1, E2, E3 be expressions. Then (0, D) := mguconp( E1, E2, £3) 
is a computable unifier. 
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Algorithm 2. Computable Unification with Abstraction 


function mguconp(£1, E2, E3) 
if E3 is uncomputable then fail 
let E be a set of equations and @ be a substitution; E := {E1 = E2}; 0 := {} 
let D be a set of disequalities; D := 0 
repeat 
if E is empty then 
return (0, D) where D is the disjunction of literals in D 
Select an equation s = t in E and remove it from E 
if s coincides with t then do nothing 
else if s is a variable and s does not occur in t then 
if s does not occur in Eg or t is computable then 0:=0o{st};E=E{srt} 
else if t= f(ti,...,tn) and f is computable then 
O=Oo (St f@insosudia he CHELES fig oon ip) Oa Hg os 5p ta} 
where %1,...,2%n are fresh variables 
else if t= f(ti,...,tn) and f is uncomputable then D := DU {s Æ% t} 
else if s is a variable and s occurs in t then fail 
else if tis a variable then € := E U {t = s} 
else if s and t have different top-level symbols then fail 
else if s=f(s1,...,5n) andt=f(ti,...,tn) then €:=EU{s1=h,...,5n=tn} 


7 Implementation and Experiments 


Implementation. We implemented our saturation-based program synthesis 
approach in the VAMPIRE prover [11]. We used Algorithm 1 with the extended 
Sup calculus of Fig.3. The implementation, consisting of approximately 1100 
lines of C++ code, is available at https://github.com/vprover/vampire/tree/ 
synthesis-pr. The synthesis functionality can be turned on using the option 
--question_answering synthesis. 

VAMPIRE accepts functional specifications in an extension of the SMT-LIB2 
format [4], by using the new command assert-not to mark the specifica- 
tion. We consider interpreted theory symbols to be computable. Uninterpreted 
symbols can be annotated as uncomputable via the command (set-option 
:uncomputable (symboli ... symbolN)). 

Our implementation also integrates Algorithm 1 with the AVATAR archi- 
tecture [26]. We modified the AVATAR framework to only allow split- 
ting over ground computable clauses that do not contain answer liter- 
als. Further, if we derive a clause C[a|Vans(r[o]) with AVATAR assertions 
Ci [G],...,Cm[e], where C/a] is ground and computable, we replace it by the 
clause C[a]V V7, >C;[o]Vans(r[a]) without any assertions. We then immedi- 
ately record a program with conditions (r[z],~C[z] A Aj, Ci[z]), and replace 
the clause by C[a]V Vi, =C;[] (see lines 7-10 of Algorithm 1), which may be 
then further split by AVATAR. 

Finally, our implementation simplifies the programs we synthesize. If during 
Algorithm 1 we record a program (z, F} where z is a variable, we do not use 
this program in the final program construction (line 12 of Algorithm 1) even if 
F occurs in the derivation of O (see Example 2). 
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Examples and Experimental Setup. The goal of our experimental evaluation 
is to showcase the benefits of our approach on problems that are deemed to 
be hard, even unsolvable, by state-of-the-art synthesis techniques. We therefore 
focused on first-order theory reasoning and evaluated our work on the group 
theory problems of Examples 1-3, as well as on integer arithmetic problems. 

As the SMT-LIB2 format can easily be translated into the SyGuS 2.1 syn- 
tax [16], we compared our results to cvc5 1.0.4 [3], supporting SyGuS-based 
synthesis [2]. Our experiments were run on an AMD Epyc 7502, 2.5GHz CPU 
with 1 TB RAM, using a 5 min time limit per example. Our benchmarks as well 
as the configurations for our experiments are available at: https://github.com/ 
vprover /vampire_benchmarks/tree/master/synthesis 


Experimental Results with Group Theory Properties. VAMPIRE syn- 
thesizes the solutions of the Examples 1-3 in 0.01, 13, and 0.03s, respectively. 
Since these examples use uninterpreted functions, they cannot be encoded in the 
SyGuS 2.1 syntax, showcasing the limits of other synthesis tools. 


Experimental Results with Maximum of n > 2 Integers. For the maxi- 
mum of 2 integers, the specification is Vz , £2 € Z. Jy € Z.(y >aiAy>aA(y= 
zıVy = a), and the program we synthesize is if x; < xq then rg else zı. 
Both our work and cvc5 are able to synthesize programs choosing the maximal 
value for up to n = 23 input variables, as summarized below. For n > 23, both 
VAMPIRE and cvcd time out. 


Number n of variables for | 2 5 10 |15 | 20) 22 | 23 
which max is synthesized 


VAMPIRE 0.03 | 0.03 | 0.05|1 (13/55 | 215 
cvc5 0.01 | 0.03) 0.6 | 6.8 88) 188 | 257 


Experimental Results with Polynomial Equations. VAMPIRE can synthe- 
size the solution of polynomial equations; for example, for Vx1,%2 E€ Z.dy €E 
Z.(y? = x? +2122 + 23), we synthesize £1 + 22. VAMPIRE finds the correspond- 
ing program in 26s using simple first-order reasoning, while cvc5 fails in our 
setup. 


8 Related Work 


Our work builds upon deductive synthesis [14] adapted for the resolution calcu- 
lus [13,22]. We extend this line of work with saturation-based program synthesis, 
by using adjustments of the superposition calculus. 

Component-based synthesis of recursion-free programs [21] from logical spec- 
ifications is addressed in [6,21,24]. The work of [21] uses first-order theorem 
proving to prove specifications and extract programs from proofs. In [6,24], SV 
formulas are produced to capture specifications over component properties and 
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SMT solving is applied to find a term satisfying the formula, corresponding to a 
straight-line program. We complement [21] with saturation-based superposition 
proving and avoid template-based SMT solving from [6,24]. 

A prominent line of research comes with syntax guided synthesis (SyGuS) [1], 
where functional specifications are given using a context-free grammar. This 
grammar yields program templates to be synthesized via an enumerative search 
procedure based on SMT solving [3,9]. We believe our work is complementary 
to SyGuS, by strengthening first-order reasoning for program synthesis, as evi- 
denced by Examples 1-3. 

The sketching technique [19,25] synthesizes program assignments to vari- 
ables, using an alternative framework to the program synthesis setting we rely 
upon. In particular, sketching addresses domains that do not involve input logical 
formulas as functional specifications, such as example-guided synthesis [23]. 


9 Conclusions 


We extend saturation-based proof search to saturation-based program synthesis, 
aiming to derive recursion-free programs from specifications. We integrate answer 
literals with saturation, and modify the superposition calculus and unification to 
synthesize computable programs. Our initial experiments show that a first-order 
theorem prover becomes an efficient program synthesizer, potentially opening 
up interesting avenues toward recursive program synthesis, for example using 
saturation-based proving with induction. 


Acknowledgements. We thank Haniel Barbosa for support with experiments with 
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FWE grants LogiCS W1255-N23 and LOCOTES P 35787. 
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Abstract. We present a uniform characterisation of three-valued log- 
ics by means of bisequent calculus (BSC). It is a generalised form of 
sequent calculus (SC) where rules operate on the ordered pairs of ordi- 
nary sequents. BSC may be treated as the weakest kind of system in the 
rich family of generalised SC operating on items being some collections 
of ordinary sequents. This family covers several forms of hypersequent 
and nested sequent calculi introduced to provide decent SC for several 
non-classical logics. It seems that for many non-classical logics, includ- 
ing some many-valued, paraconsistent and modal logics, this reasonably 
modest generalization of standard SC is sufficient. In this paper we exam- 
ine a variety of three-valued logics and show how they can be formalised 
in the framework of bisequent calculus. All provided systems are cut-free 
and satisfy the subformula property. Also the interpolation theorem is 
constructively proved for some logics. 


Keywords: Bisequent Calculus - Cut elimination - Many-valued 
Logic - Three-valued logic - Interpolation Theorem 


1 Introduction 


The aim of this paper is to provide a uniform characterization of a variety of 
three-valued logics by means of a simple cut-free generalised sequent calculus 
(SC) called bisequent calculus (BSC). It is the weakest kind of system in the 
rich family of generalised sequent calculi operating on collections of ordinary 
sequents [23]. If we restrict our interest to structures built of two sequents only, 
we obtain a limiting case of either hypersequent or nested sequent calculi; it is 
what we call bisequent calculus. 

Is such restricted calculus of any use? Hypersequent calculi already may be 
seen as a quite restrictive form of generalised SC, yet they were shown to be useful 
in many fields (see, e.g., [25] for a survey of applications of hypersequent calculi 
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in modal logic, and [37] for their use in fuzzy logic). BSC is even more restrictive 
but preliminary work on its application is promising. It was already successfully 
applied to first-order modal logic $5 [23] and to the class of four-valued quasi- 
relevant logics [27]. In what follows we will focus on another application of such 
minimal framework — to three-valued logics. 

Several proof systems of different kinds were proposed so far for many-valued 
logics (see e.g. Hahnle [20] for a survey). The most direct and popular approach 
to construction of many-valued sequent or tableau systems is based on the idea 
of syntactic representation of n values either by means of n-sided sequents (e.g. 
[8,45,56]) or by n labels attached to formulae or sets of formulae (e.g. [11,53,55]). 
This solution was presented by many authors and despite its popularity has many 
drawbacks (see [25] for discussion). Significant improvement in the construction 
of efficient SC or tableau systems for many-valued logic was proposed indepen- 
dently by Doherty [15] and Hahnle [19], where labels correspond not to single 
values but to their sets (sets-as signs). Among other proof-theoretic approaches 
to many-valued logics let us mention Caleiro and Marcelino’s [10] analytic cal- 
culi for many-valued non-deterministic logics as well as the result by Gratz [18] 
who has recently developed analytic tableau systems based on sets-as-signs DNF 
representations with a correspondence to canonical sequent calculi. 

Although BSC is a strictly syntactical calculus its semantical interpretation 
makes it similar to set-as-signs approach. A fuller discussion of this issue is 
provided in [27]. BSC is uniform in the sense that all three-valued logics are 
characterised by the same set of axiomatic sequents, and in the case of logics 
having the same set of connectives (i.e. defined in the same way) the rules are 
identical even if the set of designated values or the consequence relation is defined 
in different way. In this sense BSC is more uniform than several other approaches 
where either the set of axioms must be changed or rules for connectives must be 
different (even if described by means of the same table). In particular, BSC is 
superior in this respect to the generalised calculus presented in [25]. 

Section2 has rather encyclopaedic character and provides self-contained 
description of a representative selection of three-valued logics. Section 3 contains 
a case study of BSC for K3 and LP. In Sect. 4 we provide rules for connectives 
of all logics introduced in Sect.2. Section 5 shows how BSC can be applied to 
prove interpolation for some three-valued paraconsistent and paracomplete log- 
ics. We finish with remarks on possible extensions and comparison with other 
approaches to formalisation of many-valued logics. 


2 Logics 


We will examine several three-valued propositional logics determined by three 
element matrices with classical-like connectives (negation, disjunction, conjunc- 
tion, and implication, plus the usual three-valued modal-style connectives); we 
are not going to consider other types of connectives because of the lack of space. 
The languages of these logics are freely generated algebras similar to three ele- 
ment algebras of values. Logics are interpreted by homomorphisms from lan- 


A Uniform Formalisation of Three-Valued Logics in Bisequent Calculus 327 


guages to algebras such that h(c"(y1,.-..,n)) = c(h(yv1),---,h(Yn)) for every 
n—ary connective c and the corresponding operation c. 

Let us consider as the starting point two three element Kleene’s algebras of 
the form: As = (A,O) where A = {0,u,1} and O contains an unary operation 
a: A — A and binary operations © : A x A — A, where © € {A,V,—}. 
The operations are defined by the following truth tables in the strong and weak 
Kleene algebra; the latter considered also by Bochvar [9] (negation is the same 
in both): 


All u Olivi u Of | }1 u O}} al ^ui] u 0 Vall u O}/-,,]1 u O 
1/1 u0||1|/111|/1/1u0||1/0|| 1 Jl ud}] 1 Jlull) 1 jl ud 
ujuudO}Jujluuf/ujluufjuju]] u juuu u juuul} u juuu 
0/0 00|0|11 uO}; 0/11101 O JOud}] O Jl udO}} O lul 


We obtain four matrices by specifying a set of designated values D either as 
{1} or {1,u}. These are called GMF, SMZ, WM? and WMZ (where G stands 
for strong, W for weak, 1 and 2 indicate the amount of designated values). In 
general we will call matrices with D = {1} 1-matrices, and with D = {1, u} 
— 2-matrices. Accordingly we will also call logics determined by 1-matrices and 
2-matrices, 1- and 2-logics respectively. For any matrix we define a relation of 
matrix consequence in the standard way: 


I —w ¢ iff for any homomorphism A : if h(I) C D, then h(y) € D. 


Logics are identified with their matrix consequences. In particular, logics 
determined by these matrices are K; (strong Kleene 1-logic) [31], LP — the logic 
of paradox of Asenjo and Priest (corresponding 2-logic) [2,42], KY (weak Kleene 
1-logic) [31], PWK (paraconsistent weak Kleene 2-logic) of Halldén [21]. 

Let us consider a few modifications of strong and weak Kleene logics. Here 
is McCarthy’s logic K3* [36] (also called Kleene’s sequential and studied by 
Fitting [16]) and its interesting modification presented by Komendantskaya [32] 
under the name K3 by means of the following truth tables (again, negation is 
unchanged): 


Amc|l u 0} |Vmne}l u 0})>me}l u 0} |AK]1 uO} |V «jl u0 |>xlu0 
1 jludj} 1 1111 1 |l1u0 1 |1u0|| 1 ull 1 jlud 
u juuul] u juuul) u Juuul| u juul u luul u luu 
0 J000 O |1u0|| O |111| 0ļ0u0 O |1ud}] O 1ul 


Both K7 and K$ are logics determined by 1-matrices. An important prop- 
erty of K3, KY, K7’, and K5 is that they are the only three-valued logics with 
one designated value which produce partial recursive predicates (see [31,32] for 
more details). 

Several other important logics are obtained by changing the definitions of — 
and 7. Consider Lukasiewicz’s [34], Stupecki’s [49], Heyting’s [22] implications 
as well as Heyting’s [22], Bochvar’s [9], Post’s and dual Post’s [41] negations. Let 
us also consider yet another pair of additive conjunction and disjunction, arising 
in Lukasiewicz’s logic: 
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1 u 0|/>s7]1 u 0| |-—>z]1 u 0 H 1B p pP||AL|l u O| |vz|l u0 
1u0 1u0d 1u0|i1 1 1 1| 0 1 |l1u0]|1ļ111 
liu 111 1 10ļ|ju u u ul 1 ujudO0;/;/ujliu 
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SM? with Lukasiewicz’s implication (instead of Kleene’s one) yields famous 
Lukasiewicz’s Ls, the first many-valued logic. In L3 we may deal with two pairs 
of conjunction and disjunction. We have: y Vw = (yp >L Y) >L Y and pAw= 
=(= V =p), but y Az Y = (p >r ~y) and Y Vr Y = ~y >r Y. GM? with 
Stupecki’s implication is an alternative to L3 having the deduction theorem. It 
was studied by Słupecki, Bryll, and Prucnal [49] as well as Avron [4], under 
the name GM3. If we change negation and implication of SM? to Heyting’s 
ones, then we get Heyting’s [22] logic G3, a close relative of intuitionistic logic 
(the name after Gédel who also studied it [17]; this logic was investigated by 
Jaskowski as well [29]). The disjunction of Gt} and Post’s cyclic negation from 
Post’s logic P3 [41] which is known for being functionally complete in the three- 
valued setting. In [40], a dual cyclic negation spp was suggested (it reverses 
the direction of cyclicality of Post’s negation). Sms with Heyting’s implication 
and Bochvar’s negation was investigated by Osorio and Carballido [38] under 
the name G4. In the case of Gt} the following connectives are interesting 
as well: Sobociriski’s [51] conjunction, disjunction, and implications as well as 
D’Ottaviano/DaCosta/Jaskowski/Stupecki’s implication [13,30, 48]: 


Ag|1 u 0l [Vsl1 u 0] |—>sl1 u 0) |>%l1 u 0] /47]1 u0 
1/1101]111 1 J100 1 11u01 llu0 
ujlu0O/}| u jluO}} u u0 u 110| u jl ud 
0/000}]/0 j100}| O J111}) O 1u1l| O |111 


Sobocinski’s logic S is obtained from Sms by the replacement of all binary 
connectives of this matrix with Sobocinski’s original ones. This logic may be 
treated as a relevant logic. However, a more popular three-valued relevant 
logic is Anderson and Belnap’s RM; [1] which is obtained from GI} only 
by the replacement of its implication with Sobocinski’s one. Note that ear- 
lier Sobociński [50] considered yet another implication >‘. GMJ with the 
implication due to D’Ottaviano/DaCosta/Jaskowski/Stupecki (first mentioned 
by Shipecki [48]) instead of Kleene’s one was independently studied by sev- 
eral authors: D’Ottaviano and da Costa themselves [13,14], Asenjo and Tam- 
burino [3], Batens [7] (under the name PI‘), Avron [5] (under the name RM?), 
and Rozonoer [44] (under the name PCont). An important extension of this 
logic is J3 by D’Ottaviano and da Costa [13]. It has an additional connective 
which is Lukasiewicz’s tabular possibility operator (see below; we also present 


Lukasiewicz’s tabular necessity operator). 
Ac|l u0]|Vc}1lu0}|>c|1 u0 Ol} o 
1 100| 1/111 1 /100||1/1|1|1 
u |000| u 100} u |11 L}jufl}jujo 
0/000 0]100 O |1 1 1)/0/0)}0)0 
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As it is easy to guess, since RM3 may be viewed as a relevant logic, it should 
be paraconsistent as well. Moreover, J3, S3, LP, and many other three-valued 
logics with two designated values are paraconsistent (in contrast, three-valued 
logics with one designated value are paracomplete). One of the most famous 
three-valued paraconsistent logics is Sette’s logic P! [47]. It has Bochvar’s nega- 
tion and the above presented binary connectives (both 1 and u are designated). 
There is a version of Pt with Kleene’s negation introduced by Carnielli and Mar- 
cos [12,35] and called P?. A paracomplete companion of P1, the logic I', was 
presented by Sette and Carnielli [46]: it has Heyting’s negation and presented 
below binary connectives (the implication has been first introduced by Bochvar 
[9]). Its version with Kleene’s negation is I? due to Marcos [35]. Both I! and I? 
have one designated value. 


Ase|l u 0| iVse|l u0)|>ge)1 u 0/|>R|1 u0}|>r}1 u 0 
1 |110| 1 J111}) 1 [1 10)) 1 1100|| 1 1100 
u |110| u |111 u 110| u |110] u Jl iu 
O 1000| O |110 O |111| O |111| O |1111 


Last but not least, let us mention Rescher’s [43] and Tomova’s [57] impli- 
cations (added above). These implications can be added to GM. Tomova 
[57] introduced the concept of natural implication. In three-valued case with 
one designated value there are only 6 natural implications: Lukasiewicz’s, 
Stupecki’s, Heyting’s, Bochvar’s, Rescher’s, and Tomova’s. In the case with 
two designated values there are 24 natural implications, including Heyting’s 
and Rescher’s as well as D’Ottaviano/DaCosta/Jaskowski/Stupecki’s implica- 
tion, both Sobocinski’s implications, and Sette’s implication. 


3 Bisequent Calculus for Ks (and LP) 


Bisequents in BSC are ordered pairs of sequents T > A | H => X, where 
T, A, IT, X are finite (possibly empty) multisets of formulae. We will call the left 
component of a bisequent as 1-sequent and the right as 2-sequent respectively. 
Bisequents with all elements being atomic will be also called atomic. In what 
follows B stands for arbitrary bisequents and S for sequents. 

Let us define the calculus BSC-K3 which provides an adequate formalisation 
of K3. A bisequent => A | I => X is axiomatic iff it has nonempty [MN X or 
rAAor HAS. In fact this set of axioms is fixed for all considered calculi. If 
constants T, L,U (the last for fixed undefined proposition) are added we must 
add axioms of the form: r > A,T|7> X; rs A|7> 2,7; 1,75 A 
H> yX; r> A4A|1, H > X; U, r> A| Hs Xand r >s A| syYX,U. 

The set of rules characterising the operations of the strong Kleene algebra 
consists of the following schemata: 


r= A|I >s YX, r= A|, I> X 


e. 
aroa. lran eS 
r= A4, y| I> X ẹy, r > A| H> xX 


EEA E PaaS ee 


>l) 
(==) 
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y,v,Ts>A\s r= 4,ọ]|sS r>A,v|S 
IN TAN 
Oo) onera AIS rN | re Aents 

S\yg,v,rs>a S|rs>A,e S|r>A,uv 
A \ 
NO) cre Ae SIT SAKY 

r= 4, ,y| S gy, r= A]sS v,T>A\S 
i deere L ov. Ss Als 

S| Tr => 4,9, % S\ly,l>a S|v,lsa 
(lev) 2 oa (yay | 


S|r= 4A, pV 


Sa r= 4, y|p, >X 
r= 4,p>y| H> YX 


Slovy, >A 


gp, > A|> xX, 


E eon EOT 


rSA\T>Z,9 ¥,FSA\T>2 


a) p> pI S Al| ISS 


r> 4,y|HU>8} rs 4|4y, Hs xX 
r= A4|ygy>4y HII> YX 


(>=) 


Note that all rules satisfy the subformula property and other desirable prop- 
erties of well-behaved SC. In particular, they are context independent in the 
sense that validity-preservation of rules is intact by deletion or addition of the 
same parameters in the premisses and conclusion. This feature will be of special 
importance for the proof of the interpolation theorem. One may easily observe 
that in case of the rules for strong A,V we have just standard G3 rules but 
repeated in both components. Rules for negation and implication have different 
character since side and principal formula are in different sequents in all cases. 

Bisequents as such do not directly correspond to standard consequence rela- 
tions in suitable matrices. Hence before we define the notion of a proof in BSC-K3 
(or any other logic) it is better to start with more general concept. A proof-search 
tree for a bisequent B in BSC-L, where L is any logic, is a tree of bisequents with 
B as the root and nodes generated by rules of BSC-L. A proof-search tree is com- 
plete iff every leaf is atomic, and it is axiomatic iff all leaves are axiomatic. The 
height of a proof-search tree is defined as the length of the maximal branches. 
A simple consequence of the subformula property of rules is: 


Proposition 1. Every proof-search tree may be extended to a complete proof- 
search tree. 


The notion of a proof in BSC-K3 is introduced not only by restricting the 
class of proof-search trees in BSC-Kg to axiomatic ones but also by restricting the 
class of admissible roots. In general the rationale for bisequents is that 1-sequent 
corresponds to consequence relation in 1-matrices and 2-sequent to consequence 
relation in 2-matrices. Since K3 is characterised by 1-matrix we have: 


BSC-K3 | B iff there is an axiomatic proof-search tree for B := I > yọ |>. 


We define the L-validity (L-satisfiability) of bisequents in the following way: 
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L =T > A| s X iff every homomorphism h satisfies [> A | I > X. 
The latter holds for h iff for some y: either (p € I and h(y) 4 1) or (p € A and 
h(y) = 1) or (p € H and h(y) = 0) or (y € X and h(y) £ 0). 

Clearly L A r > A| H => X iff for some h, all elements of T are true, all 
elements of A are either false or undefined, all elements of J are either true or 
undefined and all elements of X are false. In this case we say that h falsifies this 
sequent. 

Obviously, all axiomatic bisequents are valid for any logic L. As for the rules 
they are not only sound (i.e. validity-preserving) but also invertible; namely it 
holds: 


Theorem 1. For all rules of BSC-K3, all premisses are K3-valid iff the conclu- 
sion is K3-valid. 


Proof. Straightforward proof by tedious checking. 


A simple consequence of this theorem is that for every rule the conclusion is 
falsified by some h iff at least one premiss is falsified by the same h. 


Theorem 2 (Soundness). If BSC-K; F r > ọ |>, then T Ex, 9 


Proof. By induction on the height of the proof, use Theorem 1. 


Invertibility of all rules implies that proof search process is confluent, i.e. 
that the order of applications of rules does not affect the result. In particular, B 
is provable iff every proof-search tree may be extended to obtain a proof. 


Theorem 3 (Completeness). If l Ex, p, then BSC-K; H > |=. 


Proof. Assume that l Fx, p but BSC-K ¥ r => » |=. Hence in every 
complete proof-search tree for l = y |= there is at least one branch starting 
with non-axiomatic atomic bisequent falsified by some h. Since all rules inherit 
this valuation, then the root is also falsified contrary to our assumption. 


As a simple consequence we obtain also a decision procedure for K3 (and for 
other logics L with complete BSC-L). Another by-product of our proof is that 
the following cut rules are admissible in BSC-K3 (and other logics): 


TsSA,p|AS0 pfs xS|E>SR 
LNW>A,°|A,F > 0,2 


TsSsA|A>0O0,¢ U> X|, ZE >R 
T, I => 4X |A,Es> 0,92 


(Cut |) 


(| Cut) 


Moreover, we can constructively prove that these cut rules are admissible in 
the same way as it is done for four-valued logics in [27]. Due to lack of space we 
omit this issue here. 

Note that the rules stated above provide BSC not only for K3 but also for 
LP. The only difference is that in LP we consider as provable all bisequents of 
the form =| I => y, which is a consequence of the fact that it is determined by 
2-matrix. All the results established for BSC-K3 hold for BSC-LP. 
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4 Bisequent Calculi for Other Logics 


We provide sets of rules adequate for all logics described in Sect. 2. Every oper- 
ation will be characterised by four rules of introduction to antecedents and con- 
sequents of 1- and 2-sequent. The rules are devised on the basis of geometrical 
insights based on the tabular representation of the respective connective: to 
establish the premisses for the rule with the principal formula in one of the four 
positions in a bisequent, we just examine its tabular representation. For exam- 
ple, if indicated values of the arguments form a rectangle, one premiss is enough, 
in case of more complex shapes, two or three premisses are required. Since the 
process of construction of rules on the basis of tables is not deterministic we do 
not propose any algorithm for that aim, however by the end of this section we 
will illustrated the method with one example. In every case it holds that either: 


rE, vif BSCLFr>y|> or FE, vif BSCLFK>|Cl>o 


depending on the fact whether z denotes consequence relation for logics char- 
acterised by 1-matrices or by 2-matrices. Adequacy of BSC-L for all concrete 
logics is proved in the same way as for BSC-K3. Therefore we limit our presen- 
tation to systematic characterisation of rules from which the BSC for suitable 
logic can be composed. 

We start with rules for respective unary operations (including Lukasiewicz’s 
modalities): 


(74>) 


r= A|HI = X, gy, r > A|> X 
r= A|~7y, I > X 

r= 4, | y, I >X r>A,o|yo,0> 2 
TAIS, “y, r SA] ISS 
TsSsA|I>X,o g, r> A|> X 

(= =pr |) 

TsA,-y|7T>2Z 

The remaining rules in each case (namely (~y=> |), (72H |), (78>), 
(78) (-ep=> |), (}>P |), (| >pep=>) and (|}—pp)) are like respective rules of 
BSC-K3. Consider premisses of (|=—p) and (~pp=|) displaying two occurrences 
of the same side formula: in semantical terms it gives the effect of evaluating y 
as undefined. 


(| -P=>) 


(l>7P) 


(=pp>)) 
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r= A|y, [>X r= A|UI>s YX, 


(0 I) ar sA nan TSA, ov|1>2Z 
r= A4A|y, I> X r= A|HI = X, 
19>) nae EE 
= |) g, r> A|> xX l) r= 4,gy| I> xX 
g, r> A|> X r> 4, Dy| H> X 
g, r> A|> X r= 4,y|I >X 
UES r= A] Bias ) r> A4A|H > X, 0y 


Not surprisingly rules introducing modal formula to antecedents or to succe- 
dents of 1- and 2-sequents have the same premisses; this is a consequence of the 
fact that such formula is never undefined. The same remark applies to rules for 
A and TB. 

The set of rules for weak A, V, —> is also partly identical with those for BSC- 
K3. The identical rules are (Aw=>|), (SAw |); ([>Vw), (| Vo=), (—w) and 
(|>w=>). In the remaining cases we have three premiss rules: 
r= Ajy, p, I> E TSA, g|y, I> x2 r> A4, y| y, H> 


Ausa rs Aj|gryp, HSE 
| n e ARAR g, > A| H> Xv yp, r => A|HI >X, p 
< PSA\|l> Venu 
Vw | r> 4y, y| H> X r> 4,gy|yp, H> r> A4, y|y, H> 
m r> 4yvy| I> X 
Vas | y,y, r > A|> p, r> A|Is Xv p, => A| I> SX, g 


eyy, > A|> 


) r> A4, y|, [>X r> 4,gy|yp, H> X r> A4, y|y, H> 
. PrsAyoy|TlsF5 


y,4y, r > A|> r> A|I s X, p, 4% p, r => A|> X, p 
p> pr => A|Hi>yX 


v> |) 


In the case of K3” and K$ the specific rules are: 


r>A|\|T>Z,9 gp, > A| s 2X, 


=>Am 
(l c) rs A|> Z, yny 
r= A4A|y, y, I> X r= 4,ọ|y, U >58 
(| Amc) 
TS>AlpAv,1> 2 
g, r> A|> v,TsSA\|T>X,¢ 
(V=>mc |) 
eVu¥,rsoA\TssZ 
r= 4,y, y| I> X r= 4,ọ|y, U >X 
(=>Vmc |) 
r= 4,yvy| I> 2 
r= 4, y|y, 70> r= 4, |y, I => X 
(>>mc |) 


r= 4,p> y| I> X 
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gv, r > A|> x r= A|HI>= X, 


ISi rS>A\T1> 2,0 v,TsSA\|I>X,¢ 
= TS>A|\|T>Z,pAw 
isa r= A|, y, H> X r> A, y|, H> X 
= TSAlvAv,iss 
ae) p, r> A| Hs Zu prs A|>% 
= oVvy,FrsA\Tsz 
r>A,y9,y|1>2 r> A, y|, H> x 
(=>vx |) 
r= 4, gyvy| i> X 
G l) r= 4, y| y, I >X r> A, y |4, I> X 
a rsAy-v|fsS 
yp, r >= A|> g r>A|\T>Z,9,v 
(+= «x |) 


pow=rsAl\Ts>yz 


The remaining rules in both cases are identical with (A=|), (=A |), (=v), 
(| v=), (| =) and (| ==) from BSC-K3. 

The implication of Lukasiewicz [34] and his specific additive Ag and Vz are 
characterised by the following rules: 


gp, > Ajy, H> X yrs Ajy, I> X 
TS>AlypAv,0> 25 


(| Ar >) 


rs 4p, y| H> X p, r> A| Is x,y% p, r => A| Is X, g 
r> A|> X pny 


(| > Av) 


gp, > Alv,t> X prs Ajy, I> 
r= A4A|ọyvy Is SZ 


(| Vi =) 


r>A,9,v0|f>2 p, r> A|> Xv p, r => A| UIs YX, g 
rs A|> X, yvy 


g, => 4, y| H> X r= A|yp, H> X, y% 


(=n T>s4y>y| ISX 


rs 4,y|y, H> X yp, r > A|> X r>A|T>Z,¢ 


aa! fbr Sala sA 


The remaining rules are identical with (A=]), (=A |), (œv |), (V=1), (| =) 
and (| ==) from BSC-Ks. 

For Sobociriski’s connectives we have: 
gp, r> Ajly, H> X prs Ajly, ISX 


Ae ọn, >A] 
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r> 4y, y| H> X p, r> A| Is X, 4% p, => A| H= SX, g 


(= As |) T>S>A,pAv|1>2z 
E r= 4, y| H= 2,0 r= 4, y| Ilis X, 9 
2 rs A|H= X, yvý 
(| Vs=) r> A|, yp, H> E p r> A|> Xv p, r => A| H> XZ, 
z T>sAAļ|ọvVy, ISX 
E p, l> 4, y| H> X r= A|, U= X, 4% 
3 PrsAlf@s+S,~p—->%0 
( (22 p, r > A|> X rs A4A|Is X, 
S 


r=>=A4A|y>4HI> E 


The remaining rules look like in BSC-K3. 
Sette’s connectives are characterised by the following rules: 


PoA\T>Z,9 rs A|> YX, %4 
PSA, pAy|T>Z 


(>Ase |) 


r= A|yp, y, I> E 
gy^ r> A|> X 


P>A|\T>Z,9,¥ 
To>A,ypvVeo|T>Z 


(Ase) (>Vse |) 


r= A4|y, >85 rsAlv,0l>2 


Vse => 
(Vse=l) ọvy >A] X 


rs A|> x,y rs A4A]|y Is x 
g> yr >= A|> X 


(=>se=]) 


r= A|, H= 2X, 
r= 4,p>y| IIs x 


S| p, I> Zw 
S| I = X, >p 


(>>sel) ((=>—>se) 


SI>, S|y H> E 
S|lgy> yp, H> 


(>se=) 
(| Ase=>), ((>^se), (>Vse), (| Vse=) are like in BSC-K3. 
Finally Carnielli and Sette connectives characterising I! and I’: 


r= 4,py| H> 8 rs 4, y| Hs 
rs A|> X, pry 


(I>Ac) 


gv,rsAl\T>x 
TSA\lpAv,1> 25 


r=>A,9,¥|\1>2 


AN 
(c>) rS>A|\|T>Z,evo 


(l>Ve) 
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grsAl|fss 7y,rsoAl\tsaz 


Marr) TsAlove,lsS 
(| We edict! Ese y, rs A|> gx 
S T>sAļy>y, IX 
i ta ae ( ) gp, r= 4, y4Y]|S 
> 
OTAJ U>, >y CV TSAe-v|S 
r>A,e|S v¥,rsAls 
Pas) | | 


pow,r>A|s 


(Ac> |), (>^c l), (=Vo |), (Vc= |) are like in BSC-K3. 

We finish with the characterisation of the remaining implications introduced 
in Sect. 2. In most cases it is obtained by combining rules which were previously 
introduced. In particular: 

Stupecki’s [49] implication is characterised by means of: (|=—) and (| >=) 
from BSC-Kg as well as (>—>c |) and (>c> |). 

Heyting’s implication [22] is characterised by means of: (>,=> |), (œ>—>z |), 
(l>—se),(l>se>). 

D’Ottaviano/DaCosta/Jaskowski/Stupecki’s implication [13,30,48] is char- 
acterised by means of: (>> |), (>> |), (\>—se),(/ ~se=>). 

Rescher’s implication [43] is characterised by means of: (>,=> |), (>—1z |), 
[Bess]: 

Tomova’s implication [22] is characterised by means of: (>z |), (>>z |), 
(lI>>c),(l>c>). 

Only in case of Sobocinski’s implication >‘, we have a pair of new rules: 


yp, r > A|> r= A|Is X, p,% r> 4, y|, %4, I> X 
(0 yp, T AJH py 


(+s=>) 


pr > 4A y| H> 8 r> A4A|y, l> Zy FSA, |v,1> 2,9 


(>—>5l) Pode pres 


The remaining two rules are: (|>—s,) and (|>s-=>). 
Let us show how (=—‘,|) was obtained on the basis of the table for >‘, from 
p. 4. py > wv is either false or undefined which corresponds to four cells: 


—'g{1 u 0 
1 | uO/this row says that y is 1 and wv is 0 or u — the left premiss; 
u 0 
0 | u J|this row says that ọ is 0 and w is u— the right premiss. 
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The remaining premiss covers the cell with 0 in the first and second rows 
attributed to w while ọ is 1 or u. Note that since the left premiss covers the first 
row and the right premiss covers the last row we could alternatively formulate 
the middle premiss as l > A,y | p, H = X, 4% to cover exactly the cell with 0 
in the second row (here ¢ is just u) but since ~ is 0 in two rows where ¢ is 1 
or u we can be more economical here. A reader can check that many rules can 
be formulated in alternative way. We always tried to find the most economical 
representation which can be used easily also for proving syntactically the cut 
elimination theorem (which will be shown in the extended version of this paper). 

Now, consider an arbitrary connective c of the logic L, the corresponding 
operation c as characterised by suitable matrix determining L in Sect. 2, and the 
four rules for c. It holds: 


Theorem 4. For all presented rules characterising arbitrary c of any L: all 
premisses are L-valid iff the conclusion is L-valid. 


Proof. This is an analogue of Theorem 1 for any considered logic L which implies 
adequacy of respective BSC-L. 


5 Interpolation 


We present a constructive proof of the interpolation theorem for some logics 
based on the strategy proposed by Muskens and Wintein [58]. It was originally 
applied in tableau setting for Belnap-Dunn four-valued logic as well as for K3 
and LP. Here we demonstrate that BSC can be also used for showing that 
interpolation holds for some paracomplete and paraconsistent logics. Let L € 
{I1 I°, Pt, P?}. 


Theorem 5. For any contingent formulae p, %, if p Er, %, then we can con- 

struct an interpolant for I+, I? on the basis of proof-search trees for y =|= and 

=> w |= and an interpolant for P,P? on the basis of proof-search trees for 
| y => and =|= ~ in suitable BSC-L. 


Proof. We will demonstrate the case of BSC-I'; the case of BSC-I? is identical 
and the cases of BSC-P! and BSC-P? are dual, so we only comment on them 
in the key points. Assume that y Em Y; hence by completeness we have a 
cut-free proof of p > w |= in BSC-I'. Now produce complete proof-search 
trees for y >| and > 4 |=. Since y, w are contingent, they have some non- 
axiomatic leaves. Let Ty => A, | Ih => 51,..., Ik > Ak | Hk > Xk be 
the list of non-axiomatic atomic leaves of the proof-search tree for y >|= and 
Oi > Ay | By > M,...,On => An | Sn => Qn such a list taken from the 
proof-search tree for > y |=. It holds: 


Claim (1). For any i < k and j <n, Ii, Oj > Ai, Aj | U, 5 > di, 2; is an 
axiomatic atomic bisequent. 
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To see this take a tree for y =|= and add 7 to succedents of all 1-sequents 
in the tree. Due to context independence of all rules it is a correct proof-search 
tree. Now for each leaf D; > A;,w | M; > ©; append a tree of > w |= but 
with I; added to each antecedent and A; added to each succedent of 1-sequents, 
and similarly with I; and X; in all 2-sequents. In the resulting proof-search tree 
we have leaves of the form [;,0; > 4;, A; | Hi, 5; > Xi, 2; for all i < k and 
j <n. If at least one of them is not axiomatic, then ¥ ọ > y4 |>. 


Next for every T; > A; | M; > X;, i < k, define the following sets: 

A= A:NUE; for j <n 

M; = AU Q frj<n 

X= 2N (UO UUE) forj <n 

Since every 1;, O; > 4;, A; | I, 5; > ©, Qj is axiomatic we are guaranteed 
that VUA UIU; + Ø. Note also that AT(IWUA,UIT/UE") C AT(y)NAT (Y), 
where AT stands for the set of atoms. Now define an interpolant Int(y,w) for 
considered logics. For I+, I? it has the same form: 


ADAAN Ah VVV Al) Vin AAAA VVV Ab), 


where ~I means the set of negations of all elements in JZ. 
For P+, P? Int(y, W) is defined as: 


AIAN A A ( VvV 24) vev AANA Anh V aT, VvV 5) 


We can show that: 


Claim (2). Int(y,w) is an interpolant for y =z wv. 


Proof. As an example, we present the proof for BSC-I!. For the sake of proof let 
us recall that BSC-I' consists of the rules characterising Ac, Vc, >c and =y. 
However, most of the rules necessary for conducting the proof are identical with 
respective rules from BSC-Ks3, so the label C in their names will be omitted in 
these cases for easier recognition where the specific rules (concretely (| Vc =>) 
and (|= Vc)) are required. 

Since for every AI} A A 7X) A 7A(Y I; v V A‘) all (negated) atoms are 
by definition taken from AT(y) N AT(w), we must only prove that BSC-I! H 
yp => Int(y,) |>, and BSC-I' H Int(y, Y) > y |= (the same for BSC-I’), and 
BSC-P! ->| y => Int(y,~) and BSC-P! -=| Int(y, Y) > y (and the same for 
BSC-P?). 

Again take a complete proof-search tree for y =|= and add Int(y,w) to 
every succedent of 1-sequent. For every T; > A;,Int(y,v) | Hi > Xi apply 
(= V |) to get 


D; > Ai, \ Tia Anz AA T; v V 4), Into, p) | T > Si, 


where Int(,Y)™Ż is the rest of the disjunction (if any). Applying (=> A |) we 
obtain three bisequents: 
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(c) T: > Ai (V WH V V Aj), Int(y, 4) | I; => Xi. 


Systematically applying (= A |) to (a) we obtain T; > Aj, p, Intl, Y) | 
IT, = X; for each p € I; and since IY C T; they are all axiomatic. Similarly 
with (b) but now we first obtain T; > Aj, =p, Int(y,w)~* | T; > X; for each 
p € Xl. After the application of (> ~ |) we obtain T; > Aj;,Int(y,w)~* | 
p, I; = X; which is axiomatic since X; C X;. For (c) we first apply (> ~= |) and 
obtain T; > A;, Intl, Y) | Val v V Al, Mi > Xi. By (| Vo =) we obtain: 
VIL, T; => A;, Int(p, Y)’ | IT; => X; and V 4, T; = A;, Int(p, Y)’ | IT; => 
Xi. Systematic application of (V =|) to the latter produces axiomatic bisequents 
p, D; > Aj, Intl, Y) | M; = X; for each p € A‘. Systematic application of 
(V =|) to the former produces ap, D; > Aj, Int(y,~)~* | M; => X; for each 
p € II}. After application of (= =|) they also yield axiomatic sequents. Hence 
we have a proof of y > Int(y,~) |=. 

We have to do the same with a complete proof-search tree for > w |= 
but now adding Int(y,w) to every antecedent of all 1-sequents in the tree. For 
every leaf Int(y,w),O0; > A; | 2; => Q; we apply (V = |) to each disjunct 
of Int(y, Y) until we get leaves: A TI A \ 72, A7(V 7H v V 41), 0; > A; 
= > 22; ALY ANo; N -(V ~M; VV Ay), 9j > A; | = = Nj. To each 
such leaf we apply (A =|) obtaining bisequents of the form I/, 72%, =(V —I; V 
V 4;), 0; > A; | 5) > Q; for i < k, j < n. In each case the application of (~ = 
) yields T!, 0O; > Aj | 5; > 2, X}, V ~I; v V A. The application of (> Vc) 
to V~; v V A; yields 17,0; > A, V I, V 4; | 25; > 2;, X|. Systematic 
application of (= V |) and (= ~ |) gives leaves of the form T!, O; > Aj, A; 
TS; = Nj, DA Since for every i< k, j <n, Ii, Oj > Ai, Aj | Hya > ai; 22; 


l 
is axiomatic these primed versions are axiomatic too. Assume the contrary, then 


it must be e.g. some p ¢ T} such that either p € T; N A; or p € T; N Q; (or for 
other pairs generating axioms). But it is impossible since by definition T? must 
contain such p (and the same for other cases of primed sets). 


The proof for BSC-I? is identical since the only difference between these two 
logics is that I+ has Heyting’s negation whereas in I? it is Kleene’s negation. But 
the two BSC rules for negation which are used in the proof are common to both 
negations. The proof for P+, P? is dual to the above and uses slightly different 
definition of Int(y,wW) specified above. Again the two logics differ only with 
respect to negations, but the rules used in the proof are common to Bochvar’s 
and Kleene’s one. 

Eventually note that this proof may be applied also to other logics but in 
some cases it is convenient to extend their languages. For example, interpolants 
for some logics can be defined as disjunctions of the following formulae: 


For Ka - APSA AaS! A Aap dA Aap 
For LP- A Hi A No; A Nog X A Analy 
For Gs - ALi Ang (AIE > VDA AmB 
For G5 - AW A N og4A; A Ngy; A N ogosgl” 
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6 Conclusion 


Bisequent calculi can be seen as one of the possible syntactical realizations of so 
called Suszko’s thesis [54] in the treatment of many-valued logics. According to 
Suszko every logic is two-valued in the sense that all values are divided into des- 
ignated and non-designated and this is reflected in the definition of consequence 
relation. In the case of bisequent calculi it is additionally made evident that two 
possible choices of designated values can be made. However, on a deep level a 
BSC is similar to some other proposed formalisations mentioned in the Intro- 
duction. On one hand bisequents resemble several labelled approaches where 
labels denote sets of values; a difference is that instead of labels a position of 
a formula in a bisequent is crucial, hence the method is strictly syntactical. On 
the other hand, there is a similarity with Avron’s [4] and Avron, Ben-Naim, and 
Konikowska’s [6] sequent calculi with special rules defined for negated formulae; 
a difference is that BSC satisfies ordinary subformula property and purity condi- 
tions to the effect that in schemata of rules only one (occurrence of a) connective 
is involved. The price is that instead of standard sequents we use a pair of them. 

As we mentioned in the Introduction there is one more general difference. 
In the case of labelled calculi or Avron’s SC we have the same input for 1- and 
2-logics, whereas in BSC a different input for both classes of logics is defined; a 1- 
or a 2-sequent in a bisequent. A consequence of our choice is that for every pair 
of 1- and 2-logic with the same connectives (like e.g. K3 and LP) the rules and 
axioms are identical. In contrast, in other mentioned approaches for such pairs 
of related logics, the respective calculi must differ either with respect to some 
axioms (closure conditions in tableaux) or to rules. It seems that the present 
solution where systems differ only with respect to the input is more economical 
and uniform. In fact we can consider also logics determined by different notions 
of consequence relations while still keeping the rules and axioms intact. Two 
relations considered in the text express informally the situation where either 
truth is preserved or non-falsity is preserved. But two other possibilities are open 
as well: l = |= y corresponds to the notion of no-counterexample consequence 
(see e.g. Lehmann [33], Paoli [39]), whereas = y | I = corresponds to the 
liberal consequence which leads from non-falsity to truth. This level of uniformity 
follows from the fact that rules of BSC are not computed on the basis of any 
normal (disjunctive or conjunctive) form, like in other approaches, but on the 
basis of geometrical insights illustrated in Sect. 4. 

Finally notice that the application of BSC may be extended easily to first- 
order languages. It is quite obvious how to define suitable rules for quantifiers. 
But the proof of adequacy requires more refined methods than those applied here 
so for the lack of space we limited ourselves to propositional case. However, we 
finish the paper with one more problem for further investigation: the application 
of first-order BSC to formalisation of neutral free logics, and in particular to 
specific theories of definite descriptions based on some Fregean ideas (see e.g. 
Lehmann [33], Stenlund [52]). Since sequent and tableau calculi for such theories 
built on positive and negative free logics were already provided in [24,26,28], this 
paper offers a proper ground for extension of these results to neutral free logics. 
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Abstract. Dependency pairs are one of the most powerful techniques 
to analyze termination of term rewrite systems (TRSs) automatically. 
We adapt the dependency pair framework to the probabilistic setting in 
order to prove almost-sure innermost termination of probabilistic TRSs. 
To evaluate its power, we implemented the new framework in our tool 
AProVE. 


1 Introduction 


Techniques and tools to analyze innermost termination of term rewrite systems 
(TRSs) automatically are successfully used for termination analysis of programs 
in many languages (e.g., Java [10,35,38], Haskell [18], and Prolog [19]). While 
there exist several classical orderings for proving termination of TRSs (e.g., 
based on polynomial interpretations [30]), a direct application of these orderings 
is usually too weak for TRSs that result from actual programs. However, these 
orderings can be used successfully within the dependency pair (DP) framework 
[2,16,17]. This framework allows for modular termination proofs (e.g., which 
apply different orderings in different sub-proofs) and is one of the most powerful 
techniques for termination analysis of TRSs that is used in essentially all cur- 
rent termination tools for TRSs, e.g., AProVE [20], MU-TERM [22], NaTT [40], 
TTT2 [29], ete. 

On the other hand, probabilistic programs are used to describe randomized 
algorithms and probability distributions, with applications in many areas. To 
use TRSs also for such programs, probabilistic term rewrite systems (PTRSs) 
were introduced in [8,9]. In the probabilistic setting, there are several notions of 
“termination”. A program is almost-surely terminating (AST) if the probability 
for termination is 1. As remarked in [24]: “AST is the classical and most widely- 
studied problem that extends termination of non-probabilistic programs, and 
is considered as a core problem in the programming languages community”. 
A strictly stronger notion is positive almost-sure termination (PAST), which 
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requires that the expected runtime is finite. While there exist many automatic 
approaches to prove (P)AST of imperative programs on numbers (e.g., [1,4, 11, 
15,21, 24-26, 32-34, 36]), there are only few automatic approaches for programs 
with complex non-tail recursive structure [7,12], and even less approaches which 
are also suitable for algorithms on recursive data structures [3,6,31,39]. The 
approach of [39] focuses on algorithms on lists and [31] mainly targets algorithms 
on trees, but they cannot easily be adjusted to other (possibly user-defined) data 
structures. The calculus of [6] considers imperative programs with stack, heap, 
and pointers, but it is not yet automated. Moreover, the approaches of [3,6,31, 39] 
analyze expected runtime, while we focus on AST. 

PTRSs can be used to model algorithms (possibly with complex recursive 
structure) operating on algebraic data types. While PTRSs were introduced in [8, 
9], the first (and up to now only) tool to analyze their termination automatically 
was presented in [3], where orderings based on interpretations were adapted 
to prove PAST. Moreover, [14] extended general concepts of abstract rewrite 
systems (e.g., confluence and uniqueness of normal forms) to the probabilistic 
setting. 

As mentioned, already for non-probabilistic TRSs a direct application of 
orderings (as in [3]) is limited in power. To obtain a powerful approach, one 
should combine such orderings in a modular way, as in the DP framework. In 
this paper, we show for the first time that an adaption of dependency pairs to 
the probabilistic setting is possible and present the first DP framework for prob- 
abilistic term rewriting. Since the crucial idea of dependency pairs is the modu- 
larization of the termination proof, we analyze AST instead of PAST, because 
it is well known that AST is compositional, while PAST is not (see, e.g., [25]). 
We also present a novel adaption of the technique from [3] for the direct appli- 
cation of polynomial interpretations in order to prove AST (instead of PAST) 
of PTRSs. 

We start by briefly recapitulating the DP framework for non-probabilistic 
TRSs in Sect. 2. Then we recall the definition of PTRSs based on [3,9,14] in 
Sect. 3 and introduce a novel way to prove AST using polynomial interpretations 
automatically. In Sect.4 we present our new probabilistic DP framework. The 
implementation of our approach in the tool AProVE is evaluated in Sect.5. We 
refer to [28] for all proofs (which are much more involved than the original proofs 
for the non-probabilistic DP framework from [2,16,17]). 


2 The DP Framework 


We assume familiarity with term rewriting [5] and regard TRSs over a finite sig- 
nature X and a set of variables V. A polynomial interpretation Polis a X-algebra 
with carrier set N which maps every function symbol f € X to a polynomial 
feoi E€ N[V]. For a term t € T (X, V), Pol(t) denotes the interpretation of t by 
the X-algebra Pol. An arithmetic inequation like Pol(t,) > Pol(t2) holds if it is 
true for all instantiations of its variables by natural numbers. 
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Theorem 1 (Termination With Polynomial Interpretations [30]). Let R 
be a TRS and let Pol: T (X, V) — N[V] be a monotonic polynomial interpretation 
(i.e., x > y implies fpo(...,2,---) > froil.-.,y,---) for all f € X). If for every 
L—r ER, we have Pol(£) > Pol(r), then R is terminating. 


The search for polynomial interpretations is usually automated by SMT solv- 
ing. Instead of polynomials over the naturals, Theorem 1 (and the other ter- 
mination criteria in the paper) can also be extended to polynomials over the 
non-negative reals, by requiring that whenever a term is “strictly decreasing”, 
then its interpretation decreases at least by a certain fixed amount 6 > 0. 


Example 2. Consider the TRS Raw = {(1),..., (4)} for division from [2]. 
minus(x, O) > « (1) div(O,s(y)) ~O (3) 
minus(s(x),s(y)) > minus(x,y) (2) div(s(x),s(y))—s (div(minus(z, y),s(y))) (4) 


Termination of Rminus = {(1),(2)} can be proved by the polynomial interpreta- 
tion that maps minus(z, y) to x + y + 1, s(x) to x +1, and O to 0. However, a 
direct application of classical techniques like polynomial interpretations fails for 
Rai. These techniques correspond to so-called (quasi-)simplification orderings 
[13] which cannot handle rules like (4) where the right-hand side is embedded 
in the left-hand side if y is instantiated with s(x). In contrast, the dependency 
pair framework is able to prove termination of Raiy automatically. 


SF sy 


We now recapitulate the DP framework and its core processors, and refer to, 
e.g., [2,16, 17,23] for more details. In this paper, we restrict ourselves to the DP 
framework for innermost rewriting (denoted “+R”), because our adaption to 
the probabilistic setting relies on this evaluation strategy (see Sect. 4.1). 


Definition 3 (Dependency Pair). Let R be a (finite) TRS. We decompose 
its signature X = Xc W Xp such that f E€ Xp if f = root() for some rule 
L— r € R. The symbols in Xc and Xp are called constructors and defined 
symbols, respectively. For every f € Xp, we introduce a fresh tuple symbol f*# 
of the same arity. Let X# denote the set of all tuple symbols. To ease readability, 
we often write F instead of f*. For any term t = f(ti,...,tn) € T (X, V) with 
f € Xp, let t# = f#(ty,...,tn). Moreover, for any r € T (ZV), let Subp(r) 
be the set of all subterms of r with defined root symbol. For a rule £ — r with 
Subp(r) = {t1,..., tn}, one obtains the n dependency pairs (DPs) 4# — tf with 
1<i<n. DP(R) denotes the set of all dependency pairs of R. 


Example 4. For the TRS Rgi, from Example 2, we get the following dependency 
pairs. 
M(s(2),s(y)) > M(x») (5) D(s(2),(y)) > M(x, y) (6) 
D(s(x),s(y)) + D(minus(2,y),s(y)) (7) 
The DP framework uses DP problems (D,R) where D is a (finite) set of 
DPs and R is a (finite) TRS. A (possibly infinite) sequence ite ae, ... with 


* 


tf bpr obh ha for all i is an (innermost) (D, R)-chain. Here, >p, r is the 
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restriction of —p to rewrite steps where the used redex is in normal form w.r.t. 
R. A chain represents subsequent “function calls” in evaluations. Between two 
function calls (corresponding to steps with D) one can evaluate the arguments 
with R. For example, D(s?(O),s(O)), D(s(O),s(O)) is a (DP(Raiv), Raiv)-chain, 
as D(s?(O),s(O)) 4+npcry,), Ra, D(minus(s(O), O),s(O)) LR a D(s(O),s(O)), where 
s?(O) is s(s(O)). 

A DP problem (D, R) is called innermost terminating (iTerm) if there is no 
infinite innermost (D, )-chain. The main result on dependency pairs is the chain 
criterion which states that a TRS R is iTerm iff (DP(R), R) is iTerm. The key 
idea of the DP framework is a divide-and-conquer approach which applies DP 
processors to transform DP problems into simpler sub-problems. A DP processor 
Proc has the form Proc(D, R) = {(D1, R1),---,(Pn, Rn)}, where D,Di,...,Dn 
are sets of dependency pairs and R,R1,...,R, are TRSs. A processor Proc is 
sound if (D,R) is iTerm whenever (D;, Ri) is iTerm for all 1 < i < n. It is 
complete if (Di, Ri) is iTerm for all 1 < i < n whenever (D, R) is iTerm. 

So given a TRS R, one starts with the initial DP problem (DP(R), R) and 
applies sound (and preferably complete) DP processors repeatedly until all sub- 
problems are “solved” (i.e., sound processors transform them to the empty set). 
This allows for modular termination proofs, since different techniques can be 
applied on each resulting “sub-problem” (D;,R;). The following three theorems 
recapitulate the three most important processors of the DP framework. 

The (innermost) (D,R)-dependency graph is a control flow graph that indi- 
cates which dependency pairs can be used after each other in a chain. Its 
node set is D and there is an edge from £7 — t to is — tj if there exist 


substitutions c1, g2 such that tf oy bs 02, and both fo, and is T2 N 
are in normal form w.r.t. R. Any infinite (D, R)-chain corresponds to an 
infinite path in the dependency graph, and since the graph is finite, this (6) 
infinite path must end in some strongly connected component (SCC).' 
Hence, it suffices to consider the SCCs of this graph independently. The (5) 
(DP(Raiv),; Raiv)-dependency graph can be seen on the right. U 


Theorem 5 (Dep. Graph Processor). For the SCCs Dı,..., Dn of the 
(D,R)-dependency graph, Procyg(D,R) = {(D1, R), ..., (Dn, R)} is sound and 
complete. 


While the exact dependency graph is not computable in general, there are sev- 
eral techniques to over-approximate it automatically, see, e.g., [2,17,23]. In our 
example, applying Procpg to the initial problem (DP (Rai), Raiv) results in the 
smaller problems ({(5)},Raiv) and ({(7)}, Raiv) that can be treated separately. 

The next processor removes rules that cannot be used to evaluate right-hand 
sides of dependency pairs when their variables are instantiated with normal 
forms. 


1 Here, a set D’ of dependency pairs is an SCC if it is a maximal cycle, i.e., it is a 
maximal set such that for any 4f — tř and 07 — tł in D’ there is a non-empty 
path from ¢7 — t7 to 47 — t¥ which only traverses nodes from D’. 
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Theorem 6 (Usable Rules Processor). Let R be a TRS. For every f € 
LSwWL* let Rulesr(f) = {Cr € R | root(é) = f}. For anyt € T (X w L*,V), 
its usable rules Up(t) are the smallest set such that Up(x) = Ø for alla € V 
and Ur(f(ti,---,tn)) = Rulesr (f) UU, Ur (ti) U Us reRulesr (f) Ur(r). The 
usable rules for the DP problem (D, R) are U(D, R) = Upe een Ur (t*). Then 
Procyr(D, R) = {(D,U(D, R))} is sound but not complete.” 


For the DP problem ({(7)},Rdiv) only the minus-rules are usable and thus 
Procyr({(7)}, Rav) = {({(7)}, {(), (2)}) }- For ({(5)}, Raiv) there are no usable 
rules at all, and thus Procya({(5)}, Rav) = {({(5)}, S)}. 

The last processor adapts classical orderings like polynomial interpretations 
to DP problems. In contrast to their direct application in Theorem 1, we may 
now use weakly monotonic polynomials fp.) that do not have to depend on all of 
their arguments. The reduction pair processor requires that all rules and depen- 
dency pairs are weakly decreasing and it removes those DPs that are strictly 
decreasing. 


Theorem 7 (Reduction Pair Processor with Polynomial Interpreta- 
tions). Let Pol : T (Sw L#,V) — N[V] be a weakly monotonic polynomial 
interpretation (ie. © > y implies fpol...,£,...) > froil..-,y,---) for all 
fe LW L#). Let D =D> WD, with Dy £ such that: 


(1) For every £—> r € R, we have Pol(€) > Pol(r). 
(2) For every (* — t# € D, we have Pol(¢*) > Pol(t*). 
(3) For every (# — t# € Ds, we have Pol(é*) > Pol(t#). 


Then Proce (D, R) = {(D>,R)} is sound and complete. 


The constraints of the reduction pair processor for the remaining DP prob- 
lems ({(7)}, {(1), (2)}) and ({(5)}, Ø) are satisfied by the polynomial interpre- 
tation which maps © to 0, s(x) to x + 1, and all other non-constant function 
symbols to the projection on their first arguments. Since (7) and (5) are strictly 
decreasing, Procep transforms both ({(7)}, {(1),(2)}) and ({(5)},@) into DP 
problems of the form (@,...). As Procpg(@,...) = Ø and all processors used are 
sound, this means that there is no infinite innermost chain for the initial DP 
problem (DP(Raiv), Raiv) and thus, Raiv is innermost terminating. 


3 Probabilistic Term Rewriting 


Now we recapitulate probabilistic TRSs [3,9,14] and present a novel criterion to 
prove almost-sure termination automatically by adapting the direct application 


? For a complete version of the usable rules processor, one has to use a more involved 
notion of DP problems with more components that we omit here for readability [16]. 

3 In this paper, we only regard the reduction pair processor with polynomial interpre- 
tations, because for most other classical orderings it is not clear how to extend them 
to probabilistic TRSs, where one has to consider “expected values of terms”. 
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of polynomial interpretations from Theorem 1 to PTRSs. In contrast to TRSs, 
a PTRS has finite* multi-distributions on the right-hand side of rewrite rules. 


Definition 8 (Multi-Distribution). A finite multi-distribution u on a set 
A # Ø is a finite multiset of pairs (p : a), where 0 < p < 1 is a probability 
and a € A, such that Dieaen? = 1. FDist(A) is the set of all finite multi- 
distributions on A. For u € FDist(A), its support is the multiset Supp() = {a | 
(p:a)€p for some p}. 


Definition 9 (PTRS). A probabilistic rewrite rule is a pair l —> pw € 
T (XV) x FDist(T (2',V)) such that L g V and V(r) C V(£) for every 
r € Supp(u). A probabilistic TRS (PTRS) is a finite set R of probabilis- 
tic rewrite rules. Similar to TRSs, the PTRS R induces a rewrite relation 
>r C T(2,V) x FDist(T (2,V)) where s >r {pi : ti,..-, pe: tk} if there 
is a position 7, a rule L —> {p1 :11,..-,Pe: Tk} E R, and a substitution o such 
that s|, = lo and t; = s[rjo|, for alll < j < k. We call s >r u an innermost 
rewrite step (denoted s +R u) if every proper subterm of the used redex lo is in 
normal form w.r.t. R. 


Example 10. As an example, consider the PTRS R,w with the only rule g(a) > 
{1/2 : x, 1/2: g(g(x))}, which corresponds to a symmetric random walk. 


As proposed in [3], we lift >, to a rewrite relation between multi- 
distributions in order to track all probabilistic rewrite sequences (up to non- 
determinism) at once. For any 0 < p < 1 and any pw € FDist(A), let 
p-m={(p-q:a) | (q: a) €p}. 


Definition 11 (Lifting). The lifting = C FDist(T (2',V)) x FDist(T (X, V)) 
of a relation > C T (X, V) x FDist(T (X, V)) is the smallest relation with: 


e Ift ET (X,Y) is in normal form w.r.t. >, then {1 : t} 3 {1: t}. 

e Ift— yp, then {1:t} = n. 

e If for all 1 < j < k there are uj,vj € FDist(T (2,V)) with uj = vj and 
0 < pj <1 with Vicjcp Pj = 1, then Uicjcr Di + Hj B Ure jcn Pj Yj- 


For a PTRS R, we write Sr and Lr for the liftings of >r and L g, respec- 
tively. 


Example 12. For instance, we obtain the following =r -rewrite sequence: 


{1: g(O)} Bru 12: 0,1/2: 8” (0)} Sru {1/2: 0,1/2: g(0), ya: g" (0)} 
Rw {1/2 : O, 1s : O, 1/8 : g?(O), Ys: g? (O), 1/8 : g*(0)} 
Note that the two occurrences of O and g?(O) in the multi-distribution above 


could be rewritten differently if the PTRS had rules resulting in different terms. 
So it should be distinguished from {5/8 : O, 1/4 : g?(©), 1/8 : g4(O)}. 


t Since our goal is the automation of termination analysis, in this paper we restrict 
ourselves to finite PTRSs with finite multi-distributions. 
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To express the concept of almost-sure termination, one has to determine the 
probability for normal forms in a multi-distribution. 


Definition 13 (|ujr). For a PTRS R, NFr C T (X,V) denotes the set of all 


normal forms w.r.t. R. For any p € FDist(T (X, V)), let [ulr = do wp.tyey,tenre P 


Example 14. Consider the multi-distribution {1/2 : O,1/s : O,1/s : g?(O),1/s : 
g? (0), 1/8: g4(O)} from Example 12 and Rw from Example 10. Then |y|z,, = 
1/2 + 1/8 = 5/8 . 


Definition 15 ((Innermost) AST). Let R be a PTRS and (un)nen be an infi- 
nite Zr-rewrite sequence, i.e., Un BR Hn+1 for alln € N. Note that lim |un|r 


exists, since |Un|r < |Hun+1lr < 1 for alln E€ N. R is almost-surely terminating 
(AST) (innermost almost-surely terminating (iAST)) if lim |un|r = 1 holds 


for every infinite 3R-rewrite sequence (Ar-rewrite sequence) (Ln)nen- 


Example 16. For the (unique) infinite extension of the =p,,-rewrite sequence 
(n)nen in Example 12, we have lim |un|r = 1. Indeed, Rw is AST (but 
n—co 


not PAST, i.e., the expected number of rewrite steps is infinite for every term 
containing g). 


Theorem 17 introduces a novel technique to prove AST automatically using 
a direct application of polynomial interpretations. 


Theorem 17 (Proving AST with Polynomial Interpretations). Let R be 
a PTRS, let Pol : T (X,V) — N[V] be a monotonic, multilinear? polynomial 
interpretation (i.e., for all f E€ X, all monomials of fpoi(£1,..., £n) have the 
form c-afi-...- xr with c E€ N and e,...,e, E {0,1}). If for every rule 


€— {p1 : T1,- Pki TKE R, 

(1) there exists al <j < k with Pol(£) > Pol(r;) and 
(2) Pol(£) > D ai<j<k pj : Pol(r;), 

then R is AST. 


In [3], it was shown that PAST can be proved by using multilinear poly- 
nomials and requiring a strict decrease in the expected value of each rule. In 
contrast, we only require a weak decrease of the expected value in (2) and in 
addition, at least one term in the support of the right-hand side must become 
strictly smaller (1). As mentioned, the proof for Theorem 17 (and for all our 
other new results and observations) can be found in [28]. The proof idea is based 
on [32], but it extends their approach from while-programs on integers to terms. 
However, in contrast to [32], PTRSs can only deal with constant probabilities, 
since all variables stand for terms, not for numbers. Note that the constraints 
(1) and (2) of our new criterion in Theorem 17 are equivalent to the constraint 
of the classical Theorem 1 in the special case where the PTRS is in fact a TRS 
(i.e., all rules have the form £ —> {1 : r}). 


5 As in [3], multilinearity ensures “monotonicity” w.r.t. expected values, since multi- 
linearity implies fpoi(..-,>’1<;<4 Pi * Pol(r;),.-.) = Micj<n Pit Pol(f(..-575,--+))- 
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Example 18. To prove that Rw is AST with Theorem 17, we can use the poly- 
nomial interpretation that maps g(x) to x + 1 and Ø to 0. 


4 Probabilistic Dependency Pairs 


We introduce our new adaption of DPs to the probabilistic setting in Sect. 4.1. 
Then we present the processors for the probabilistic DP framework in Sect. 4.2. 


4.1 Dependency Tuples and Chains for Probabilistic Term 
Rewriting 


We first show why straightforward adaptions are unsound. A natural idea to 
define DPs for probabilistic rules > {p1 : r1, ..-, Pk : rk} E€ R would be (8) or 


(9): 
{2 > {py iri,...,pirt®,...,pe i Tk} | t; E€ Subp(rj) with1<j<k} (8) 
{0 > {p : tf, pr: t} | tj € Subp(r;) for alll <j <k} (9) 


For (9), if Subp(r;) = Ø, then we insert a fresh constructor L into Subp(r;) 
that does not occur in R. So in both (8) and (9), we replace r; by a single 
term t” in the right-hand side. The following example shows that this notion of 
probabilistic DPs does not yield a sound chain criterion. Consider the PTRSs 
Ry and Ra: 


Ri = {g > {1/2 : O, 1/2: f(g,g8)}} Re={g— {/2: O, 1/2: f(g,g,8)}} (10) 


R is AST since it corresponds to a symmetric random walk stopping at 0, 
where the number of gs denotes the current position. In contrast, Ra is not AST 
as it corresponds to a random walk where there is an equal chance of reducing 
the number of gs by 1 or increasing it by 2. For both Rı and Rə, (8) and (9) 
would result in the only dependency pair G > {1/2 : O,1/2: G} and G = {1/2 : 
L,1/2 : G}, resp. Rewriting with this DP is clearly AST, since it corresponds 
to a program that flips a coin until one gets head and then terminates. So the 
definitions (8) and (9) would not yield a sound approach for proving AST. 

Ry, and Rə show that the number of occurrences of the same subterm in the 
right-hand side r of a rule matters for AST. Thus, we now regard the multiset 
MSubp(r) of all subterms of r with defined root symbol to ensure that multiple 
occurrences of the same subterm in r are taken into account. Moreover, instead 
of pairs we regard dependency tuples which consider all subterms with defined 
root in r at once. Dependency tuples were already used when adapting DPs for 
complexity analysis of (non-probabilistic) TRSs [37]. We now adapt them to the 
probabilistic setting and present a novel rewrite relation for dependency tuples. 


Definition 19 (Transformation dp). If MSubp(r) = {t1,...,tn}, then we 
define dp(r) = Gate, ...,t#). To make dp(r) unique, we use the lexicographic 
ordering < on positions where ti = r|; and Tı < ... < Tn. Here, we extend Xo 
by fresh compound constructor symbols cn of arity n for n € N. 
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When rewriting a subterm t” of cnlt?, ...,t#) with a dependency tuple, one 
obtains terms with nested compound symbols. To abstract from nested com- 
pound symbols and from the order of their arguments, we introduce the following 
normalization. 


Definition 20 (Normalizing Compound Terms). For any term t, its con- 
tent cont(t) is the multiset defined by cont(cy(ti,...,tn)) = cont(tı) U ... U 
cont(tn) and cont(t) = {t} otherwise. For any term t with cont(t) = {ti,...,tn}, 
the term cn (t1, ..., tn) is a normalization oft. For two terms t,t’, we define t = t 
if cont(t) = cont(t'). We define ~ on multi-distributions in a similar way: when- 
ever tj = t} for alll <j < k, then {p1 : t1,- -., Pr : tr} © {P1 : tis- -+3 Deity}. 


So for example, c3(x, x,y) is a normalization of ca(cı(£), c2(x,y)). We do 
not distinguish between terms and multi-distributions that are equal w.r.t. ~ 
and we write c,(t1,...,¢n) for any term t with a compound root symbol where 
cont(t) = {ti,...,tn}, i.e., we consider all such t to be normalized. 

For any rule £ > {p1 : r1,...,Pk : rk} E€ R, the natural idea would be to 
define its dependency tuple (DT) as 4# — {pı : dp(ri),---, pe : dp(rz)}. Then 
innermost chains in the probabilistic setting would result from alternating a 
DT-step with an arbitrary number of R-steps (using = However, such chains 
would not necessarily correspond to the original rewrite sequence and thus, the 
resulting chain criterion would not be sound. 


Example 21. Consider the PTRS Ra = {f(O) — {1 : f(a)},a > {1/2 : bi, 1/2 : 
bə},bı — {1 : O},bə — {1 : f(a)}}. Its DTs would be D3 = {F(O) —> 
{1 : Co(F(a),A)},A —} {1/2 : cı(B1), 1/2 : cı(B2)}, Bı —} {1 : co}, B2 —} {1 : 
co(F(a), A)}} Rg is not iAST, because one can extend the rewrite sequence 


{1:f(O)} Ers {1:F(a)} rey {1/2:F(br), 1/2:f(b2)} ey {1/2:F(O), Y/2:F(F(a))} (11) 


to an infinite sequence without normal forms. The resulting chain starts with 


> 4 1:c1(F(O)) 
Ds { 1: c2 (F a ) 

PD; { 1/2: co(F(a), B1), 1/2 : co(F(a), Ba) } 

SR; {1/4 H c2(F(b1), Bi), 1/4 > c2(F(b2), B1), 1/4 : c2(F(b1), B2), 1/4 : c2(F(b2), B2).} 


The second and third term in the last distribution do not correspond to terms 
in the original rewrite sequence (11). After the next D3-step which removes Bj, 
no further D3-step can be applied to the underlined term anymore, because bə 
cannot be rewritten to O. Thus, the resulting chain criterion would be unsound, 
as every chain (un)nen in this example contains such Ds3-normal forms and 
therefore, it is AST (i.e., Jim \-n|D, = 1 where |un|D, is the probability for 
D3-normal forms in un). So we have to ensure that when A is rewritten to Bı 
via a DT from Ds, then the “copy” a of the redex A is rewritten via R3 to the 
corresponding term bı instead of bə. Thus, after the step with +R, we should 
have Co(F(b1), Bi) and C2(F(b2), B2), but not Co(F(b2), Bi) or Co(F(b1), B2). 
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Therefore, for our new adaption of DPs to the probabilistic setting, we oper- 
ate on pairs. Instead of having a rule > {p1 : r1,..., Pk : rk} from R and its 
corresponding dependency tuple 4# — {pı : dp(ri),...,px : dp(rz)} separately, 
we couple them together to (¢#,0) > {p1 : (dp(r1),r1),---,pe : (dp(re), Te) }- 
This type of rewrite system is called a probabilistic pair term rewrite system 
(PPTRS), and its rules are called coupled dependency tuples. Our new DP frame- 
work works on (probabilistic) DP problems (P,S), where P is a PPTRS and S 
is a PTRS. 


Definition 22 (Coupled Dependency Tuple). Let R be a PTRS. For every 
€—> w= {pi : r1,...,Pk : Tr} E€ R, its coupled dependency tuple (or simply 
dependency tuple, DT) is DT(€ — p) = (€*,0) — {p1 : (dp(ri),r1),---, Pk : 
(dp(rk),Tk)}. The set of all coupled dependency tuples of R is denoted by DT(R). 


Example 23. The following PTRS Rpaiv adapts Raiv to the probabilistic setting. 


minus(x, O) — {1: a} (12) minus(s(x),s(y)) — {1 : minus(x,y)} (13) 
div(O,s(y)) > {1 : O} (14) 
div(s(x), s(y)) — {1/2 : div(s(x), s(y)), 1/2 : s(div(minus(a, y), s(y)))} (15) 


n (15), we now do the actual rewrite step with a chance of 1/2 or the 
terms stay the same. Our new probabilistic DP framework can prove auto- 
matically that Rpaiv is iAST, while (as in the non-probabilistic setting) a 
direct application of polynomial interpretations via Theorem 17 fails. We get 
DT (Rpaiv) = {(16),..., (19)}: 


(M(x, O), minus(x, O)) — {1 : (co, x)} (16) 
(M(s(a), s(y)), minus(s(x),s(y))) — {1 : (c1(M(a, y)), minus(x, y))} (17) 
(D )) > {1 : (co, O)} (18) 


( ’ 
(D(s(x), s(y)), div(s(a), s(y))) > {1/2 : (c 


s(y) 

O,s(y)), div(O,s(y) 
),s(y) 1 

1/2 : (c2(D(minus(«, y), s(y) 


(D(s(x), s(y))), div(s(x), s(y))), 
), M(z, y)), s(div(minus(x, y), s(y))))} (19) 


Definition 24 (PPTRS, bp s). Let P be a finite set of rules of the form 
(€# O — {p1 : (di,ri),---,; pe: (dk, rk)}. For every such rule, let proj (P) con- 
tain L# — {pı : d1,..., Pk : dk} and let proja(P) contain € > {p1 : r1,..., Pk : 
rk}. If proja(P) is a PTRS and cont(d;) C cont(dp(r;)) holds? for alll < j < k, 
then P is a probabilistic pair term rewrite system (PPTRS). 

Let S be a PTRS. Then a normalized term cn(s1,...,Sn) rewrites with the 
PPTRS P to {pı : bı,..., Pk : bg} w.r.t. S (denoted =p s) if there are an 
1<i<n, an (C*,0) > {p1 : (di,71),---, pe: (dk, Tk) } € P, a substitution o 
with s; = l#o € NFs, and for alll < j < k we have bj = cn (t! ,...,t}) where 


6 The reason for cont(d;) C cont(dp(r;)) instead of cont(d;) = cont(dp(r;)) is that in 
this way processors can remove terms from the right-hand sides of DTs, see Theorem 
32. 
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e t= djo for alll < j < k, i.e., we rewrite the term s; using proj, (P). 
e For every 1 <i’ <n withi #7’ we have 
(i) ti, = sy foralll<j<k or 
(ii) t, = sy[rjo]- for all1 < j < k, 
if si|- = lo for some position T and if € — {p1 : r1,..-, Pk: Tk} ES. 
So s stays the same in all bj or we can apply the rule from proja(P) to 
rewrite sj in all bj, provided that this rule is also contained in S. Note that 
even if the rule is applicable, the term s can still stay the same in all bj. 


Example 25. For R3 from Example 21, the (coupled) dependency tuple for the 
f-rule is (F(O),f(O)) — {1 : (co(F(a),A),f(a))} and the DT for the a-rule is 
(A, a) — {1/2 : (c1 (Bi), b1), 1/2 : (c1 (Bz), b2)} With the lifting =e of >p s, we 
get the following sequence which corresponds to the rewrite sequence (11) from 
Example 21. 


{1 €1(F(O))} Borea,r {1 : ca(F(a),A)} 
BDT (Re), Ra {1/2 : C2(F(b1), B1), 1/2 : co(F(b2), Ba) } 


So with the PPTRS, when rewriting A to Bı in the second step, we can simul- 
taneously rewrite the inner subterm a of F(a) to bı or keep a unchanged, but 
we cannot rewrite a to bg. This is ensured by bı in the second component of 
(A, a) — {1/2 : (c1(Bi), bi),...}, since by Definition 24, if s; contains fo at some 
arbitrary position 7, then one can (only) use the rule in the second component 
of the DT to rewrite fo (i.e., here we have sy = F(a), s; = A, and s;|, =a). A 
similar observation holds when rewriting A to B2. Recall that with the notion of 
chains in Example 21, one cannot simulate every possible rewrite sequence, which 
leads to unsoundness. In contrast, with the notion of coupled DTs and PPTRSs, 
every possible rewrite sequence can be simulated which ensures soundness of the 
chain criterion. Of course, due to the ambiguity in (i) and (ii) of Definition 24, 


(20) 


one could also create other “unsuitable” 37 (Ry) Ra -Sequences where a is not 
reduced to bı and bg in the second step, but is kept unchanged. This does not 
affect the soundness of the chain criterion, since every rewrite sequence of the 
original PTRS can be simulated by a “suitable” chain. To obtain completeness 
of the chain criterion, one would have to avoid such “unsuitable” sequences. 

We also introduce an analogous rewrite relation for PTRSs, where we can 
apply the same rule simultaneously to the same subterms in a single rewrite 
step. 


Definition 26 (5s). For a PTRS S and a normalized term cn(S1,.--, Sn), 
we define Cn(S1,-.., Sn) >s {p1 : bi,- Pk : bk} if there are an1 <i< n, an 
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L — {pi : r1,..., Pk: Tk} E€ S, a position n, a substitution o with si|, = lo 
such that every proper subterm of lo is in NFs, and for all 1< j< k we have 
by = Cy (ti, -t2 ) where 


ti = 8,[rjo|, for alll <j < k, i.e., we rewrite the term s; using S. 
For every 1 <i’ < n with i #7’ we have 
(i) t, = sy forali<j<k or 

(ii) i = 8) |rjo], for alll < j < k, if si|- = lo for some position T. 
So for example, the lifting Zs of +>5 for S = Rs rewrites {1 : c2(f(a), a)} to both 
{1/2 : c2 (f (b1), b1), 1/2 : c2(f(b2), b2)} and {1/2 : co(f(a), b1), 1/2 : c2 (f(a), ba) }. 
A straightforward adaption of “chains” to the probabilistic setting using 


= o = would force us to use steps with DTs from P at the same time 
for all terms in a multi-distribution. Therefore, instead we view a rewrite 
sequence on multi-distributions as a tree (e.g., the tree representation of 
the rewrite sequence (20) from Example 25 is on the right). Regarding the 


paths in this tree (which represent rewrite 

41: FLO 
sequences of terms with certain probabilities) . AHON 
allows us to adapt the idea of chains, i.e., that p4 


one uses only finitely many S-steps before the 
next step with a DT from P. 


1/2: co(F(bi), Bi) 1/2: co(F(b2), B2) 


Definition 27 (Chain Tree). T= (V, E,L,P) is an (innermost) (P, S)-chain 
tree if 


1. V £ Ø is a possibly infinite set of nodes and E C V x V is a set of directed 
edges, such that (V, E) is a (possibly infinite) directed tree where vE = {w | 
(v,w) E€ E} is finite for everyv E V. 

2. L:V > (0,1) xT (2 W SF, v) labels every node v by a probability py and a 
term t,. For the root v € V of the tree, we have p, = 1. 

3. P C V \ Leaf (where Leaf are all leaves) is a subset of the inner nodes 

to indicate whether we use the PPTRS P or the PTRS S for the rewrite 

step. S = V \ (Leaf UP) are all inner nodes that are not in P. Thus, V = 

P W S W Leaf. , 

4. Forallv € P: IfvE = {w1,..., wp}, then ty =P s E onsi tuek: 

For all v € S: If vE = {w1,... we}, then ty >s {he EAA ATAT 


Pup 
Pu 
Pup 
Pv 
6. Every infinite path in © contains infinitely many nodes from P. 


i 


Conditions 1-5 ensure that the tree represents a valid rewrite sequence and 
the last condition is the main property for chains. 


Definition 28 (|T|Leat, iAST). For any innermost (P,S)-chain tree Y we 
define |Z|Leat = oye eat Pu: We say that (P, S) is iAST if we have |Z|Lear = 1 
for every innermost (P,S)-chain tree T. 
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While we have |T|pear = 1 for every finite chain tree F, for infinite chain trees T 
we may have |T]Leaf < 1 or even |Z|Lear = 0 if T has no leaf at all. 

With this new type of DTs and chain trees, we now obtain an analogous 
chain criterion to the non-probabilistic setting. 


Theorem 29 (Chain Criterion). A PTRS R is iAST if (DT(R), R) is iAST. 


In contrast to the non-probabilistic case, our chain criterion as presented 
in the paper is sound but not complete (i.e., we do not have “iff” in Theorem 
29). However, we also developed a refinement where our chain criterion is made 
complete by also storing the positions of the defined symbols in dp(r) [27]. In this 
way, one can avoid “unsuitable” chain trees, as discussed at the end of Example 
25. 

Our notion of DTs and chain trees is only suitable for innermost evaluation. 
To see this, consider the PTRSs R} and R4 which both contain g —> {1/2 : 
O,1/2 : h(g)}, but in addition Ri has the rule h(x) — {1 : f(a,x)} and R5 
has the rule h(a) — {1 : f(x,x,x)}. Similar to Rı and Rə in (10), R} is AST 
while R4 is not. In contrast, both R| and R4 are iAST, since the innermost 
evaluation strategy prevents the application of the h-rule to terms containing g. 
Our DP framework handles Ri and R4 in the same way, as both have the same 
DT (G,g) + {1/2 : (co, O), 1/2 : (co(H(g),G),h(g))} and a DT (H(z), h(z)) > 
{1 : (co, f(...))}. Even if we allowed the application of the second DT to terms of 
the form H(g), we would still obtain |T|pear = 1 for every chain tree T. So a DP 
framework to analyze “full” instead of innermost AST would be considerably 
more involved. 


4.2 The Probabilistic DP Framework 


Now we introduce the probabilistic dependency pair framework which keeps the 
core ideas of the non-probabilistic framework. So instead of applying one ordering 
for a PTRS directly as in Theorem 17, we want to benefit from modularity. Now 
a DP processor Proc is of the form Proc(P, S) = {(P1,S1),..-,;(Pn,;Sn)}, where 
P,P1,---;Pn are PPTRSs and S, S1,...,Sn are PTRSs. A processor Proc is 
sound if (P,S) is iAST whenever (P;,S;) is iAST for all 1 < i < n. It is 
complete if (Pi, Si) is iAST for all 1 < i < n whenever (P,S) is iAST. In the 
following, we adapt the three main processors from Theorems 5, 6, and 7 to the 
probabilistic setting and present two additional processors. 

The (innermost) (P,S)-dependency graph indicates which DTs from P can 

rewrite to each other using the PTRS S. The possibility of rewriting with S is 
not related to the probabilities. Thus, for the dependency graph, we can use the 
non-probabilistic variant np(S) = {> r; | L —> {p1 : r1,..-, Pk: Tk} E S,1 < 
j Sk}. 
Definition 30 (Dep. Graph). The node set of the (P,S)-dependency graph 
is P and there is an edge from (0#, 4) > {pı : (di,r1),---,Pe : (dk rk)} to 
(0# lo) > ... if there are substitutions 01,02 and t# € cont(d;) for some 1 < 
j < k such that t*#o, A (S) lf o and both lo and lf o are in NFs. 
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For Rpgiv from Example 23, the (DT (Rpaiv), Rpdiv)- 
dependency graph is on the side. In the non-probabilistic DP 1) j 
framework, every step with +>p pr corresponds to an edge in | (16) (17) D 
the (D, R)-dependency graph. Similarly, in the probabilistic set- 
ting, every path from one node of P to the next node of P in a (P,S)-chain 
tree corresponds to an edge in the (P, S)- dependency graph. Since every infi- 
nite path in a chain tree contains infinitely many nodes from P, when track- 
ing the arguments of the compound symbols, every such path traverses a cycle 
of the dependency graph infinitely often. Thus, it again suffices to consider 
the SCCs of the dependency graph separately. So for our example, we obtain 
Procpg(DT (Rpaiv), Rpaiv) = {({(17) }, Rpaiv), ({(19) }, Rpaiv) }- To automate the 
following two processors, the same over-approximation techniques as for the 
non-probabilistic dependency graph can be used. 


(18) 


aD 


Theorem 31 (Prob. Dep. Graph Processor). For the SCCs P1,...,Pn of 
the (P, S)-dependency graph, Proce (P, S) ={(P1, S), --.; (Pn, S)} is sound and com- 
plete. 


Next, we introduce a new usable terms processor (a similar processor was 
also proposed for the DTs in [37]). Since we regard dependency tuples instead 
of pairs, after applying Procpg, the right-hand sides of DTs UF, lL) >... might 
still contain terms t# where no instance t#o, rewrites to an instance lf o of a 
left-hand side of a DT (where we only consider instantiations such that gt cı and 
ee 02 are in NFg, because only such instantiations are regarded in chain trees). 
Then ¢# can be removed from the right-hand side of the DT. For example, in the 
DP problem ({(19)}, Rpaiv), the only DT (19) has the left-hand side D(s(z), s(y)). 
As the term M(2, y) in (19)’s right-hand side cannot “reach” D(...), the following 
processor removes it, i.e., Procyr({(19)}, Rpaiv) = {({(21)}, Rpaiv) }, where (21) 
is 


(D(s(zx),s(y)), div(s(x), s(y))) — {1/2 : (c1 (D(s(@),s(y))), div(s(x),s(y))), 
1/2 : (cı (D(minus(z, y),s(y))),s(div(minus(e, y),s(y))))}- (21) 


So both Theorems 31 and 32 are needed to fully simulate the dependency 
graph processor in the probabilistic setting, i.e., they are both necessary to 
guarantee that the probabilistic DP processors work analogously to the non- 
probabilistic ones (which in turn ensures that the probabilistic DP framework 
is similar in power to its non-probabilistic counterpart). This is also confirmed 
by our experiments in Sect. 5 which show that disabling the processor of The- 
orem 32 affects the power of our approach. For example, without Theorem 32, 
the proof that Rpaiv is iAST in the probabilistic DP framework would require a 
more complicated polynomial interpretation. In contrast, when using both pro- 
cessors of Theorems 31 and 32, then one can prove iAST of Rpaiv with the same 
polynomial interpretation that was used to prove iTerm of Rgiy (see Example 
36). 
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Theorem 32 (Usable Terms Processor). Let g be a term and (P, S) be 
a DP problem. We call a term t# usable w.r.t. gt and (P,S) if there is a 


(ef | Lo) — ... E€ P and substitutions 01,02 such that t#o, mere; lf o and 
both fo and lf oa are in NFs. If d = cnlt, ...,t#), then UT (d) # p s denotes 
the term Cm (t” ti ), where 1 < iy <... < im <n are the indices of all 


a7 ea 


terms tf that are usable w.r.t. gf and (P, S). The transformation that removes 
all non-usable terms in the right-hand sides of dependency tuples is denoted by: 


Tor(P, S) = {4*0 TÈ {pı : (UT (di) c#,p,8511); <e’ Pk : (UT (dk) c#,p,55 7k) } 
| (2, £) — {pr : (di,ri),..., pe: (de, Te) } E€ P} 


Then Procyr(P, S) = {(Fur(P, S),S)} is sound and complete. 


To adapt the usable rules processor, we adjust the definition of usable rules 
such that it regards every term in the support of the distribution on the right- 
hand side of a rule. The usable rules processor only deletes non-usable rules from 
S, but not from proj,(P). This is sufficient, because according to Definition 24, 
rules from proj.(P) can only be applied if they also occur in S. 


Theorem 33 (Probabilistic Usable Rules Processor). Let (P,S) be a DP 
problem. For every f € X W L* let Ruless(f) = {L —> u € S | root(£) = 
f}. For any term t € T (X w X#,V), its usable rules Us(t) are the smallest 
set such that Us(x) = Ø for all x € V and Us(f(ti,..-,tn)) = Ruless(f) U 
U; Us (ti) U Us. veRuless(f),reSupp(u) 4s (r). The usable rules for (P,S) are 
U(P, S) = Ue#peproj, (P),desupp(u) 4s (d). Then Procw(P, S) = {(P,U(P,S))} 


is sound. 


Example 34. For the DP problem ({(21)}, Rpaiv) only the minus-rules are usable 


and thus Procya({(21)},Rpav) = {21}, {(12), (13) })}. For ({(17)}, Rpaiv) 
there are no usable rules at all, hence Procyr({(17)}, Rpaiv) = {({(17) }, Ø)}. 


For the reduction pair processor, we again restrict ourselves to multilin- 
ear polynomials and use analogous constraints as in our new criterion for 
the direct application of polynomial interpretations to PTRSs (Theorem 17), 
but adapted to DP problems (P,S). Moreover, as in the original reduction 
pair processor of Theorem 7, the polynomials only have to be weakly mono- 
tonic. For every rule in S or proj,(P), we require that the expected value 
is weakly decreasing. The reduction pair processor then removes those DTs 
(C#, 0) — {pi : (di,r1),...,pe : (dk, rk)}} from P where in addition there is 
at least one term d; that is strictly decreasing. Recall that we can also rewrite 
with the original rule 2 —> {pı : r1,..., px : Tk} from projo(P), provided that 
it is also contained in S. Therefore, to remove the dependency tuple, we also 
have to require that the rule £ — r; is weakly decreasing. Finally, we have to 
use c-additive interpretations (with Cnpol(£1, -< -, £n) = £1 +... + £n) to handle 
compound symbols and their normalization correctly. 
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Theorem 35 (Probabilistic Reduction Pair Processor). Let Pol: T(2'u 
S*,V) > N[V] be a weakly monotonic, multilinear, and c-additive polynomial 
interpretation. Let P =PsW Ps with Py # Ø such that: 


(1) For every £ > {p1 : T1,- Pk : Tk} E€ S, we have Pol(€) > di, <j<p PjPol(r;). 

(2) For every (4#, 0 — {p1 : (d1; r1), -- -Pk : (de, TK) } E P, we have Pol(£#) > 
J i<j<k Pi i Pol(d;). 

(3) For every U#, O — {pi : (di,r1),...,pe : (de, rey} € P>, there exists a 
1<j<k with Pol(€*) > Pol(d;). 
If€— {p1 : r1,..., Pk : rk} E€ S, then we additionally have Pol(£) > Pol(r;). 


Then Proce (P, S) = {(P>,S)} is sound and complete. 


Example 36. The constraints of the reduction pair processor for the two DP 
problems from Example 34 are satisfied by the c-additive polynomial interpre- 
tation which again maps O to 0, s(x) to x + 1, and all other non-constant 
function symbols to the projection on their first arguments. As in the non- 
probabilistic case, this results in DP problems of the form (@,...) and subse- 
quently, Procpg(@,...) yields Ø. By the soundness of all processors, this proves 
that Rodiv is iAST. 


So with the new probabilistic DP framework, the proof that Rpaiv is iAST 
is analogous to the proof that Raiv is iTerm in the original DP framework (the 
proofs even use the same polynomial interpretation in the respective reduction 
pair processors). This indicates that our novel framework for PTRSs has the 
same essential concepts and advantages as the original DP framework for TRSs. 
This is different from our previous adaption of dependency pairs for complexity 
analysis of TRSs, which also relies on dependency tuples [37]. There, the power is 
considerably restricted, because one does not have full modularity as one cannot 
decompose the proof according to the SCCs of the dependency graph. 

In proofs with the probabilistic DP framework, one may obtain DP problems 
(P,S) that have a non-probabilistic structure (i.e., every DT in P has the form 
(¢#, 0) — {1 : (d,r)} and every rule in S has the form ¢’ — {1 : r’}). We now 
introduce a processor that allows us to switch to the original non-probabilistic 
DP framework for such (sub-)problems. This is advantageous, because due to 
the use of dependency tuples instead of pairs in P, in general the constraints 
of the probabilistic reduction pair processor of Theorem 35 are harder than 
the ones of the reduction pair processor of Theorem 7. Moreover, Theorem 7 
is not restricted to multilinear polynomial interpretations and the original DP 
framework has many additional processors that have not yet been adapted to 
the probabilistic setting. 


Theorem 37. (Probability Removal Processor). Let (P,S) be a probabilis- 
tic DP problem where every DT in P has the form (€#,0) — {1 : (d,r)} and 
every rule in S has the form @ — {1: r'}. Let np(P) = {l# — t# | CF > 
{1: d} € proj,(P),t* € cont(d)}. Then (P,S) is iAST iff the non-probabilistic 
DP problem (np(P),np(S)) is iTerm. So if (np(P),np(S)) is iTerm, then the 
processor Procpp(P, S) = Ø is sound and complete. 
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5 Conclusion and Evaluation 


Starting with a new “direct” technique to prove almost-sure termination of prob- 
abilistic TRSs (Theorem 17), we presented the first adaption of the dependency 
pair framework to the probabilistic setting in order to prove innermost AST auto- 
matically. This is not at all obvious, since most straightforward ideas for such 
an adaption are unsound (as discussed in Sect. 4.1). So the challenge was to find 
a suitable definition of dependency pairs (resp. tuples) and chains (resp. chain 
trees) such that one can define DP processors which are sound and work analo- 
gously to the non-probabilistic setting (in order to obtain a framework which is 
similar in power to the non-probabilistic one). While the soundness proofs for our 
new processors are much more involved than in the non-probabilistic case, the 
new processors themselves are quite analogous to their non-probabilistic coun- 
terparts and thus, adapting an existing implementation of the non-probabilistic 
DP framework to the probabilistic one does not require much effort. 

We implemented our contributions in our termination prover AProVE, which 
yields the first tool to prove almost-sure innermost termination of PTRSs on 
arbitrary data structures (including PTRSs that are not PAST). In our exper- 
iments, we compared the direct application of polynomials for proving AST 
(via our new Theorem 17) with the probabilistic DP framework. We evaluated 
AProVE on a collection of 67 PTRSs which includes many typical probabilistic 
algorithms. For example, it contains the following PTRS Ras for probabilistic 
quicksort. 


rotate(cons(x, zs))— {1/2 : cons(x, xs), 1/2 : rotate(app(xs, cons(z, nil)))} 


)> 
qs(nil) — {1 : nil} 
qs(cons(x, zs))— {1 : qsHelp(rotate(cons(z, xs)))} 
))—> 


qsHelp(cons(z, xs {1 : app(qs(low(zx, xs)), cons(x, qs(high(x, xs))))} 


The rotate-rules rotate a list randomly often (they are AST, but not termi- 
nating). Thus, by choosing the first element of the resulting list, one obtains 
a random pivot element for the recursive call of quicksort. In addition to the 
rules above, Rg; contains rules for list concatenation (app), and rules such that 
low(x, xs) (resp. high(x, zs)) returns all elements of the list zs that are smaller 
(resp. greater or equal) than x, see [28]. Using the probabilistic DP framework, 
AProVE can prove iAST of Rg, and many other typical programs. 

61 of the 67 examples in our collection are iAST and AProVE can prove iAST 
for 53 (87%) of them. Here, the DP framework proves iAST for 51 examples and 
the direct application of polynomial interpretations via Theorem 17 succeeds for 
27 examples. (In contrast, proving PAST via the direct application of polynomial 
interpretations as in [3] only works for 22 examples.) The average runtime of 
AProVE per example was 2.88s (where no example took longer than 8s). So our 
experiments indicate that the power of the DP framework can now also be used 
for probabilistic TRSs. 
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We also performed experiments where we disabled individual processors of 
the probabilistic DP framework. More precisely, we disabled either the usable 
terms processor (Theorem 32), both the dependency graph and the usable terms 
processor (Theorems 31 and 32), or all processors except the reduction pair 
processor of Theorem 35. Our experiments show that disabling processors indeed 
affects the power of the approach, in particular for larger examples with several 
defined symbols (e.g., then AProVE cannot prove iAST of Rg; anymore). So 
all of our processors are needed to obtain a powerful technique for termination 
analysis of PTRSs. 

Due to the use of dependency tuples instead of pairs, the probabilistic DP 
framework does not (yet) subsume the direct application of polynomials com- 
pletely (two examples in our collection can only be proved by the latter, see 
[28]). Therefore, currently AProVE uses the direct approach of Theorem 17 in 
addition to the probabilistic DP framework. In future work, we will adapt fur- 
ther processors of the original DP framework to the probabilistic setting, which 
will also allow us to integrate the direct approach of Theorem 17 into the prob- 
abilistic DP framework in a modular way. Moreover, we will develop processors 
to prove AST of full (instead of innermost) rewriting. Further work may also 
include processors to disprove (i)AST and possible extensions to analyze PAST 
and expected runtimes as well. Finally, one could also modify the formalism of 
PTRSs in order to allow non-constant probabilities which depend on the sizes 
of terms. 

For details on our experiments and for instructions on how to run our imple- 
mentation in AProVE via its web interface or locally, we refer to https: //aprove- 
developers. github.io/ProbabilisticTermRewriting/. 


Acknowledgements. We are grateful to Marcel Hark, Dominik Meier, and Florian 
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Abstract. This paper describes the formal verification of NP-hardness 
reduction functions of two key problems relevant in algebraic lattice the- 
ory: the closest vector problem and the shortest vector problem, both 
in the infinity norm. The formalization uncovered a number of problems 
with the existing proofs in the literature. The paper describes how these 
problems were corrected in the formalization. The work was carried out 
in the proof assistant Isabelle. 
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1 Introduction 


In recent years, algebraic lattices have received increasing attention for their 
use in post-quantum cryptography. Algebraic lattices are additive, discrete sub- 
groups of R”, i.e. a set of points in R” with certain structures. One can also define 
lattices over finite fields, rings or modules as used in many modern post-quantum 
crypto systems such as the CRYSTALS suites, NTRU and Saber. 

Two problems form the very basis for computationally hard problems on lat- 
tices, namely the closest vector problem (CVP) and the shortest vector problem 
(SVP). Given a finite set of basis vectors in R”, the set of all linear combinations 
with integer coefficients forms a lattice. In optimization form, the SVP asks for 
the shortest vector in the lattice and the CVP asks for the lattice vector closest 
to some given target vector, both with respect to some given norm. 

When working over the reals, the p-norm (for p > 1) is defined as %/)}, |x;|?. 
The most common examples are the Euclidean norm ||z||2 and the infinity norm 
|z]|00 = max;{|x,|}, which is the limit for p — oo. 

We have formalized, corrected and verified a number of NP-hardness proofs 
from the literature, uncovering a number of mistakes along the way. The first 
NP-hardness proof of the CVP and SVP in infinity norm is due to van Emde- 
Boas [7]. For other norms (especially for the Euclidean norm), there is only a 
randomized reduction for the NP-hardness of the SVP so far [2]. For the CVP, 
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NP-hardness has been shown in any p-norm for p > 1. One exemplary proof can 
be found in the book by Micciancio and Goldwasser [15, Chapter 3, Thm 3.1]. 

The CVP and SVP were the starting point for lattice-based post-quantum 
cryptography [16]. Moreover, the relevance of these problems can also be seen 
from the rich literature on approximation results. For example, the LLL- 
algorithm by Lenstra, Lenstra and Lovász [12] gives a polynomial-time algorithm 
for lattice basis reduction which solves integer linear programs in fixed dimen- 
sions. Using this reduced basis, one can find good approximations to the CVP 
using Babai’s algorithm [3] for certain approximation factors. Still, for arbitrary 
dimensions, the problem remains NP-hard. Further approximation results for 
the CVP, SVP and integer programming can be found elsewhere [6,9, 10, 14,19]. 
These approximation problems are used in cryptography. However, we will focus 
on the exact CVP and SVP in this paper. 

A number of more basic NP-hardness proofs have been formalized in several 
theorem provers so far. For example, there are formalizations of the Cook-Levin 
Theorem in Coq [8] and Isabelle [4]. Formalizing Karp’s 21 NP-hard problems 
(including the Subset Sum and Partition Problems assumed to be NP-hard in 
this paper) in Isabelle is an ongoing project. 


1.1 Contributions 


In this paper we present NP-hardness proofs of the CVP and SVP in infinity 
norm that have been verified in a proof assistant. We roughly follow the book by 
Micciancio and Golwasser [15, Chapter 3, Thm 3.1] and the report by van Emde- 
Boas [7]. However, many problems with the original proofs were encountered 
during the formalization efforts. We will have a look at different approaches and 
their advantages or problems. 

We also verified the proof of NP-hardness of the CVP for any finite p > 1 
from the book by Micciancio and Goldwasser. This verification did not uncover 
any problems with the informal proof. Thus we do not discuss it in detail. 

These formalizations were carried out with the help of the proof assistant 
Isabelle [17,18] and are available online [11]. They comprise 5200 lines. To the 
authors knowledge, they are the first formalizations of hardness proofs for lattice 
problems. Because of the importance of the SVP and CVP and the problems 
in existing proofs, we consider our proofs a contribution to the foundations of 
verified cryptography. However, we do not claim that these hardness results 
directly imply quantum-resistance of any lattice-based cryptosystems. 


1.2 Overview 


The paper is structured as follows. Section 2 introduces the foundations. The 
rest of the paper is dedicated to the proofs, which are phrased as the following 
two polynomial time reduction chains: 


— Subset Sum <p CVP 
— Partition <p Bounded Homogeneous Linear Equations <p SVP 
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Subset Sum and Partition are famous fundamental problems whose NP-hardness 
has been proved many times in the literature and which we take for granted. 

Section 3 presents the reduction of Subset Sum to the CVP. Differences 
between our formalization and the book by Micciancio and Goldwasser [15] are 
presented with examples that demonstrate problems with the original proof. 
Moreover, an example is given why the generalization to the SVP given in [15] 
does not work. 

Therefore we turn to the early proof of NP-hardness of the SVP by van Emde 
Boas [7]. This proof uses the Bounded Homogeneous Linear Equations problem 
(BHLE) which is introduced in Sect.4. The formalization of this proof is one 
of the major achievements in this paper. It posed a significant challenge since 
it often relied on human intuition and had to be restructured appropriately to 
allow a formal proof. The main proof steps are explained and difficulties in the 
formalization effort are described. This proof only works in infinity norm and we 
explain why. In Sect.5, the reduction from BHLE to the SVP is given. Again, 
this proof was quite elaborate to formalize as there were inaccuracies and a 
lot of intuition was involved. Differences between the formal proof and [7] are 
explained by examples. 

In Sect. 6, we have a quick look at the reduction proof for the CVP in p-norm 
(for finite p > 1). In the case of the SVP there only exists a randomized hardness 
proof in Euclidean norm by Ajtai [1] up to now. 

Finally, the time complexity of the reduction functions are considered in 
Sect. 7. We conclude the paper with a short summary and outlook. 


2 Foundations 


This section introduces known foundations mainly to fix the terminology and 
notation: problem reductions, lattices, and the combinatorial problems under 
consideration (CVP, SVP, Partition and Subset Sum). 


2.1 Problem Reductions 


Formally, a decision problem is given by the set of YES-instances P and a set 
I’ of problem instances, where P C I’. We often associate the decision prob- 
lem with the set of YES-instances, when the instance set I’ is obvious and not 
explicitly defined. In this paper we will often phrase problems informally (e.g. 
“decide if p is prime”) rather than give them explicitly as sets. For example, the 
decision problem “decide if a natural number p is prime” will be formalized in 
the following way: the set of problem instances is I’ = N (in Isabelle these are 
all elements of type nat); and the YES-instances are P = {p € N | p is prime} 
(in Isabelle this is a set of type nat set). 


Definition 1 (Problem reduction). Let AC I and BC A be two problems. 
A function f : T — A is a reduction from A to B if it fulfills the following 
properties: 
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Vael. ace As fla)EB 


- f can be computed in polynomial time 


If A is NP-hard, a reduction to B proves NP-hardness of B. 

In this paper we present reduction functions informally (e.g. “an a is reduced 
to a b that is constructed like this”) and often with copious amounts of “...” to 
construct vectors etc. Of course in the formalization these reduction functions 
are spelled out in complete detail. Since all operations used in the reduction 
functions in this paper are elementary, the polynomial time property has not 
been formalized but is briefly discussed in Sect.’7. The focus of our paper are 
the proofs a € AS f(a) E€ B. 


2.2  Lattice-Based Computational Problems 


To have a better understanding, we will first introduce lattices as such. Lattices 
are a structured set of points. They form an additive, discrete subgroup of R”. 
Formally, we define the following. 


Definition 2 (Lattice). Let A = {a,,...,a,} C R” be a set of linearly inde- 
pendent vectors. Then the integer span of A forms a lattice L, that is: 


L= {doen | G E z} 
i=l 


. . . . . . . . o d ° ° * bd bad 
. . . . . . . . . . . . . . a 
. . . . . . . . ° ° e e e ° e 
. . . . . . . e e P e ° r 
. . . . . . 

a e e Ea e e 
e . . . . . . . 

. . . . . . a 

e e e . . . . . 
. . . . . . . . bad od bad ad ° o e 
. . . . . . . . . . . . . . . 
(a) Lattice with rectangular ba- (b) Lattice with triangular 
sis vectors basis vectors 


Fig. 1. Two exemplary lattices in R? 


Example 1. In Fig. 1 two examples of lattices in R? are depicted. The red point 
is the origin. The two blue arrows show the basis vectors a; and ag that are 
linearly independent and span the lattice. Every integer combination of the two 
blue arrows is a black point, an element of the lattice. 

We can see that the grid spanned by the basis vectors is discrete and has some 
recurring structures. These structures are determined by the basis vectors: the 
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angle between them and their length. In Fig. 1a, the angle between the two basis 
vectors is 90° yielding a rectangular fundamental domain. Whereas in Fig. 1b, we 
have an angle of 60° between the basis vectors and equal length. This produces 
a fundamental domain of an equilateral triangle. 

Indeed, the automorphism group of a lattice is a symmetry group, see Con- 
way |5, Chapter 3.4]. For example, in Fig. 1a the symmetry group is pmm and 
in Fig. 1b is it p3m1 [13]. 


In the rest of the text and in the formalization we restrict to finite bases over 
Z (instead of R), simply for computability reasons. Of course bases over Q can 
be transformed into bases over Z by scaling all basis vectors. 

The starting point of most known hard problems on lattices are the shortest 
vector problem and the closest vector problem. They are defined below (as usual 
in decision and not in optimization form). The lattice £ C Z” is assumed to be 
generated by a finite basis in Z”. 


Definition 3 (Closest Vector Problem (CVP)). Given a lattice L, a vector 
be Z” and an estimate k, decide whether there exists a vector v € L such that 


lv = b|| < k 


Definition 4 (Shortest Vector Problem (SVP)). Given a lattice L and 
an estimate k, determine whether there exists a vector v € L such that 


vl| < k and v #0 


2.3 Partition and Subset Sum Problems 


Recall that we plan to prove NP-hardness of the CVP and SVP in the case of 
the infinity norm by reducing the well-studied NP-complete Subset Sum and 
Partition problems to the CVP and SVP. We state the definitions. 


Definition 5 (Partition problem). Given a finite list of integers a1,..., an, 
does there exist a partition of {1...n} into subsets I and {1...n}\ I such that 


> a= > Qi 
iEI ic{1...n}\ 


The Partition problem can be seen as a special case of the Subset Sum 
problem. 


Definition 6 (Subset Sum problem). Given a finite list of integers 
a1,- -an and an integer s, decide whether there exists a subset S of {1...n} 


such that 
Tas 
1ES 


370 K. Kreuzer and T. Nipkow 


2.4 Notation 


Throughout the paper we use traditional mathematical notation, in particular 
the graphical “...”. The formal Isabelle notation is by necessity more verbose 
(and precise). Our formalization employs both lists and vectors as a type for 
finite sequences and converts between them where necessary. For reasons of pre- 
sentation we blur this distinction in the paper. 


3 CVP 


In this section, we formalize the proof of the NP-hardness of the CVP in the 
infinity norm along the lines of [15, p 48., Chap. 3.2, Thm 3.1] by reducing Subset 
Sum to the CVP. 


An instance a1,...,a@n,S of Subset Sum is mapped to the following instance 
of the CVP: 
a, ++: An s—l 
a, ++: An s+1 
L=|2 Oj.z b=] t k=1 (1) 
0 2 1 


We proved the following theorem: 


Theorem 1. The above mapping is a reduction from the Subset Sum problem 
to the CVP (in infinity norm). 


This implies that the CVP (in infinity norm) is an NP-hard problem. 
The reduction function used by Micciancio and Goldwasser [15] actually looks 


a bit different. The image of a1,...,an,s would be 
Q1 +++ Gn S 
2 0 1 
B= L=B. Z7” b= k=1 (2) 
0 2 1 


However, the proof in [15, p. 49] with this reduction function works only for 
p < oo. It goes along the lines of the following idea: Take k = ¢/n. In the case 
of p = œ, we get k = limp... ¢#/n = 1. Then we can formulate the following 
equality (equation (3.5) in [15, p. 49}): 


n 
y Qili — S 
i=1 


Given a YES-instance @1,...,@n,5 of Subset Sum, there exists a vector z = 
(r1,...,2n) € {0,1}", such that >", aiz; — s = 0 and |2x; — 1| = 1. Then 
|| Bx — b||— = n which proves this case. 


g n 
Bx — blg = + 2% -1P (3) 
w=1 
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Given a YES-instance of the CVP defined by £, t and k that are the image of 


1,- -, an, 8 under the reduction function as in (2), we get || Bx — b|| < n. Since 
all values are integers, we have |2x;—1| > 1. It follows that S>;"_, a;x;—s = 0 and 
|2x;— 1| = 1. Thus, we can deduce that a1,...,@n,5 was indeed a YES-instance 


of Subset Sum. 

The major problem we encountered was that this proof works fine for p < co 
but for p = oo, the sum in (3) becomes a maximum instead. The equation then 
reads 


n 


y Qili — S 


i=l 


||Ba — b||.. = max ( 


TETIERE 


This invalidates the arguments in the proof since |} ;_] aix; — s| can now be in 
the range {—1,0,1}. The constraints are too lax to ensure the equality to zero. 

A solution was to alter the matrix and target vector and add another entry. 
The matrix and target vector we used are given in Eq. (1). The alternation to 
s — l and s + 1 forces a linear combination of the a; to be exactly s in the 
hardness proof, since | >; cia; — (s £1)| < 1. 

After communicating with Daniele Micciancio, one of the authors of [15], he 
suggested using a constant c > 1 and the generating instance 


C: a1 C: an c:s 
2 0 1 

L= Z” b= k=1 
0 2 1 


This solves the problem as well and can be implemented using e.g. c = 2. This 
technique is described later in the book [15, pp. 49-51] when trying to explain 
the NP-hardness proof for the SVP in the infinity norm. 


3.1 Towards the SVP 


The authors of [15] argue that the reduction argument of the SVP can be deduced 
generating an instance of the SVP using the Subset Sum instance a1,..., an, S 
in the following way. For c > 1, e.g. c = 2, take 


C: QI ***€* Gy CoS 
2 0 1 

B= ; L=B.-Z™" k=1 
0 2 1 


The authors claim that every shortest vector in the image of the reduction func- 
tion has —1 as last coefficient. For example, let a YES-instance of the SVP be 
defined by the generating matrix B of the lattice and let x = (£1,..., £n, —1)T 
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be the coefficients such that Bz is a shortest vector. Then we know that 


c+ (£101 +--+ + Znan — 8) 
224 — 1 
| Balloo = 
2£n — 1 pA 
Since c > 1, it follows, that z1a1 +--+ Znan — s = 0, which yields a solution 
for the given Subset Sum instance a1,..., an, S. 
However, this reduction does not always work as the following example shows: 


Example 2. Given the Subset Sum instance (a1, a2, a3, s) = (1,1,1,1). This is a 
YES-instance, since a solution is given by zı = 1, z2 = 0 and x3 = 0. The basis 
matrix of the corresponding SVP would be (with c > 1) 


cecce 
2001 
0201 
0021 


B= 


Take for example the vector v = B+ (—1,—1,—-1,3)? = (0,1,1,1)T. It has 
infinity norm 1 and is thus a shortest vector in the lattice generated by B. 
However, this vector has the last coefficient 3 and not —1, even though it clearly 
is a shortest vector of the lattice given by B. The corresponding scaled “solution” 
for Subset Sum would be (1/3,1/3,1/3,—1) but since only integer values are 
allowed in the solution space, this is not a solution in our sense. 

We consider another example. Let the Subset Sum instance be a, = 3, s’ = 1. 
We can easily see that this is not a YES-instance, i.e. there exists no solution. 
Still, the corresponding SVP instance given via the reduction function is gener- 


ated by the matrix 
,_ (e-3e-l 
cee 


In this case the coefficients (—1,3)7 yield a shortest vector in the lattice spanned 


by B’, since 
le GIANG 


Thus, B’ defines a YES-instance of the SVP, but the original Subset Sum 
instance is not a YES-instance. 

In [15], it is stated for the infinity norm that any shortest vector yields a 
solution for the Subset Sum Problem, which is not the case in these examples: 
we cannot ensure that a shortest vector always has —1 as a last coordinate. 


Although the proof in [15] does not work out as expected, there is still 
the reduction proof by van Emde-Boas [7] which reduces a problem called the 
Bounded Homogeneous Linear Equation problem to the SVP in infinity norm. 
This will be discussed in the next two sections. 
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4 Bounded Homogeneous Linear Equations 


A technical report by Peter van Emde-Boas [7] gives another reduction proof 
for the NP-hardness of the SVP in infinity norm. The author first reduces the 
Partition Problem to a problem called Bounded Homogeneous Linear Equation 
(BHLE) which is then reduced to the SVP. 


Definition 7 (Bounded Homogeneous Linear Equations problem). 
Given a finite vector of integers b € Z” and a positive integer k, decide whether 
there exists an x E€ Z” \ {0} with ||z||œo < k such that 


(b,x) =0 


We have verified a reduction from Partition to BHLE, and thus BHLE is 
NP-hard. 


Theorem 2. There is a reduction from Partition to BHLE in infinity norm. 


The proof is carefully engineered and rather intricate. Differences to the original 
proof and problems encountered during the formalization are: 


— Our formal proof has a different structure than the proof in the technical 
report [7]. Indeed, the technical report first proves the reduction of a weaker 
form of Partition to BHLE and then argues that “omitting” an element yields 
the desired result as it adds stricter constraints. In the formalization we skip 
this intermediate step and directly prove the existence of an appropriate 
reduction function. 

— Steps that seem trivial in the technical report often require a long formal 
proof. What can be reasoned by intuition in a pen-and-paper proof has to 
be elaborated in the formal proof. Intuition is also sometimes used for hand- 
waving over small gaps or imprecisions. 

— Indexing vectors and lists has been a problem in the formalization. In pen- 
and-paper proofs, one can argue easily about “omitting” an element of a 
list even though this is imprecise and often misuses the notation. In the 
formalization one cannot simply skip an index. All indexing functions in the 
formalization have to be total. “Omitting” an element can only be solved by 
re-indexing and re-structuring the lists in the proof. 

— Numbers are interpreted in different number systems during the proof. In 
contrast to the original proof, the formalization has to explicitly state the 
digits for a change of basis and show equivalence. This leads to verbose and 
elaborate proofs. To make proofs easier, we use the concrete basis d = 5 
instead of an unspecified basis d > 4 as in [7]. Furthermore, the number M 
must use the absolute values of the a; (omission in the definition of M in [7]). 
The formal definition is stated below. 

— The proof involved many arguments about manipulations of huge sums. 
Working with huge sums entails very large proof states where the exist- 
ing proof automation mostly failed on. These proof states require detailed 
(but still readable) proofs and occasional manual instantiation of theorems. 
Another possible solution to get smaller proof states is to introduce local 
abbreviations for subterms. 
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Let us have a look at the proof and its difficulties in the formalization in 
more detail. We start from a Partition instance a = a1,...,@n. Note that we 
ignore the trivial case n = 0 in this presentation (but deal with it in the formal 
proofs)—this means n — 1 > 0. We reduce a to a BHLE instance b as follows: 

— Define 


n 


M=2. ($ lal) +1 (4) 


i=1 


— For 1 <i < n generate a 5-tuple 


by = a; + M (58-4 454-3 4 541) (5) 
bi2 =M- (54-3 FA 54) 
b3=M- (544 aii 541-2) 
bia =a; +M. (54-2 4541 54i) 
bis = M - (54-1) 
bi = bi 1, bi,2, bi 4, bi,5, bi,3 


Note that b; has moved to the last position in b;. 
— For i = n generate only a 4-tuple: 


bn = an + M «(54-4 4 54-8 4 540-1) 
bro = M- (54"-3 + 1) 
bn = an + M - (5-2 451 41) 

bn,5 = M - (54°71?) (6) 


bn = bn; bn2; bn,4, bn,5 


Note that 
e bn 3 is omitted from bn to restrict the constraints necessary for the proof 
and 
e that in bn, 2 and bn 4 the last summand changes to a +1 in comparison to 
the other b; 2 and b; 4. 


In summary, the entry b; 3 is uniformly in the last position in the b; but omitted 
from the final bn. 
The Partition instance a of length n is reduced to a vector b of length 5n—1: 


b = (b1, ...,bn—1, bn) (7) 
The NP-hardness proof now follows in three steps: 


1. We need to show an auxiliary lemma. 

2. We show that a YES-instance of Partition is reduced to a YES-instance of 
BHLE. 

3. We show that the pre-image of a YES-instance of BHLE is indeed a YES- 
instance in Partition. 
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4.1 Auxiliary Lemma 
As a first step, the proof needs a short auxiliary lemma from number theory. 


Lemma 1. Let x,y,c € Z” and M be an integer. Assume that M > Ð`; |x| 
and that |c;| <1 for alll <i < n. Furthermore, let the following equation hold: 


n 


Soci: (ai +M-yi) =0 (8) 


i=l 
Then we have 
(c,z)=0 and (c,y)=0 


In this lemma, we can reinterpret x; + M - y; from (8) as a number in basis M 
with lowest digit x;. Even with a coefficient c;, the lowest digit in basis M has 
to be zero, as well as the rest. By splitting off the lowest digits consecutively, we 
can show, that indeed all digits in basis M have to equal zero. 


4.2 a€ Partition — b € BHLE 


This direction is quite easy. Let a,,...,@, be a YES-instance of partition with 
partitioning set J. We will show that the following vector x is a solution to the 
corresponding BHLE: 


L = (£1,--.-,Ln—1, Ln) 

(10,10 i€IAn-1€I 
0,0,-1,1,1 icI^aAn-1¢I 
0,0,-1,1,1 i¢IAn—-1lel 
1,-1,0,-1,0 i¢IAn-141 
£n = 1,—1,0,—1 


l<i<n 


We have to show that (b,x) = 0. This is proven by plugging in the definitions 
and rearranging terms in the sum of the scalar product such that they cancel 
out. As a last step in the proof, we need to show that ||z||,. < 1. For the infinity 
norm this is quite easy. However, it would not be true for other norms. For p > 1 
and p < co we have for n > 1: 


|z||p = V3n > 1 


Thus, the chosen constraints x only work in infinity norm. 


4.3 a € Partition <— b € BHLE 


This direction is harder. Let b be a YES-instance of BHLE. That is, there exists 
a nonzero x such that (b,x) = 0 and ||z||so < 1. We have to show that there is 
a partition J on a1,..., an with J jer ai = ief1 nr Qi. 


376 K. Kreuzer and T. Nipkow 


The proof idea works as follows. First, we apply the auxiliary lemma and 
get a constraint on the a; on the one hand, and a condition on the 2; with 
coefficients that are powers of 5 on the other hand. Using this condition on the 
£i, we generate equational constraints on the entries of x by looking at the digits 
in basis 5. We argue that a number equals zero if and only if all its digits are 
Zero. 

The generated equations lead to a good characterisation of x, namely the 
weight w = £5(n—1)+1: From the assumption that ||2||.. < 1, we deduce |w| < 1. 
Again, this step can only be reasoned in the infinity norm. For other p-norms, this 
argumentation breaks as we need the property |w| < 1 to complete the proof. 
Using the value of w, we can constuct a partitioning set J with the required 
property from the equation on the a;. 


5 SVP 


Knowing that the BHLE is indeed an NP-hard problem, we reduce it to the 
SVP. Then we can conclude that the SVP in infinity norm is NP-hard. 


Theorem 3. There is a reduction from BHLE to the SVP in infinity norm. 


Again some difficulties were met when formalizing the proof for the above 
theorem. First of all, note that the terminology in [7] and nowadays is a bit 
different. In [7], the shortest vector problem only denotes the shortest vector 
problem in the Euclidean norm. What we call the shortest vector problem in 
the infinity norm is named closest vector problem in [7]. To make terminology 
even more confusing, our understanding of the closest vector problem is called 
the nearest vector problem in [7]. To make the notation clear, we provide a table 
for reference in Fig. 2. 


technical report [7] | our notation 
closest vector problem SVP in infinity norm 
shortest vector problem |SVP in Euclidean norm 
nearest vector problem CVP 


Fig. 2. Notation 


A more mathematical problem encountered was that the reduction itself used 
in [7] was not entirely correct. In the reduction two factors k’ = k+1 and k” were 
introduced. These factors should have certain properties to allow the arguments 
of the reduction proof to go through. However, this is only true when tweaking 
these factors a bit to make the whole proof watertight. We will now have a closer 
look. 
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Given the BHLE instance b = (b),...,b,) and k, create the following SVP 
instance: 


1 0 0 

L= E 71.2" k=k 
0 10 
-— (k+1)-b— k" 


where k” is the factor in question. In the technical report, we have 


k!" =2-(k+1)- (8) )+1 


The following example however shows that this factor is not enough. 


Example 3. Consider the BHLE instance given by b = (1,—1) and k = 1. This 
is a YES-instance, since the vector (1,1) yields the expected properties. 
Define the following matrices. 


10 0 10 0 10 0 
Bo= {0 1 0 By ={0 1 0 B>ə= |01 0 
2—21 2-29 6 —6 25 


The associated SVP instance is the lattice generated by Bo. Then the vector 
(0,0,1)7 with infinity norm 1 is a solution to the SVP instance generated by the 
basis matrix Bo. However, since the last entry is nonzero, this does not provide 
a solution for BHLE. Contrary to this example, the proof in the technical report 
shows that for all SVP solutions the last entry must be zero. 

The reason, why the argument in the technical report breaks at this point is 
because bı + b2 = 0, thus making k” = 1 very small. One step to prevent this is 
to use the absolute values of the b; in k” instead. The new k// we consider is 


ki =2-(k+1) © bil) +1 
With this new factor kf we get the generating matrix Bı and the vector 
(0,0, 1) is no longer a shortest vector. 
Still, this is not enough. Consider the same b = (1,—1) as above, but let 
k = 5. Then we get Bə as the generating matrix of the SVP lattice. The vector 
= (0,5,1)" is a shortest vector whose last entry is nonzero. Again it contradicts 
the proof in the Toa report. The reason this time is the following: the 


argument that (k+1) (X ;—1 zibi) and kY have different relative sizes fails. Indeed, 
we have 

10 0 0 0 

010]ļ]-{5 = 5 =5<k 

6 —6 25 17 [l =5 Jls 


We can obtain different relative sizes of (k+1) (%;_—4 z:b;) and kY by defining 


t 


k! =2-k-(k+1)- (> es) +1 (9) 
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Now we can make sure that the last entry of a solution to the SVP problem 
is indeed zero. For the proof of Theorem 3 we consider the reduction given by 


1 00 


0 1 0 
—(k4+1)-b-—k 
ee m 
B 
where B denotes the basis matrix generating the lattice £ as given above. 
Consider a solution xz = (#1,...,2%n41) of the SVP with ||Ba||,. < k. Then 
we have 


1 0 0 ry Tı 
E 7 Je da 
0 10 En Ln 
= (k + 1) -b— k3 Tnt+1 (k + 1) zibi) + Tn+1' ky 


As this yields a solution to the SVP, we get: 
\(k + 1)( Noa )+ anor kel <k (10) 


Then we calculate: 


(k + 1)( Soa) i) F n41 ` kg < = (k T DO |2°|]bi|) + Tni’ kg < 


i=1 


(k+ DOC lbil) + tn4i > ky 
i=1 


IA 


Assuming that 2,41 4 0, we have 


(bE MS [oad] < [2b (RE) (XE D +1 = ISLS lensa = E 


w=1 


Thus the two summands indeed have different relative sizes and can never cancel 
out the other summand. This leads to a contradiction to (10). Therefore, £n41 = 
0 must be true and (z£1,..., £n) constitutes a solution to the BHLE when using 
k as in (9). 


6 Other p-Norms 


Up to now, we have investigated lattice problems under the infinity norm. Even 
though this yields nice hardness results, in practice the Euclidean norm is used 
more often. Unfortunately, when considering p-norms things do not play out as 
nicely. In this section, we assume 1 < p < co whenever we talk about a specific p. 
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For the CVP, there is a generalisation of the proof for every p-norm in [15, p. 
48, Chap. 3.2, Thm 3.1] which we also formalized. Let a1,..., an, s be an instance 
of Subset Sum. The reduction function maps this instance to: 


Q1 +++ An S 
2 0 1 

P= Ze b= k= yn 
0 2 1 


Then the following theorem holds: 


Theorem 4. The above mapping is a reduction from the Subset Sum problem 
to the CVP in p-norm. 


This implies that the CVP in p-norm is an NP-hard problem. The outline to 
the proof is given in Sect. 3 after Theorem 1. The important difference to the 
infinity norm is that the bound k scales with the dimension n of the lattice. 

For the SVP, there is no known deterministic NP-hardness result in the 
Euclidean norm, or even any p-norm. However, Ajtai [1,2] found an interesting 
alternative which is quite useful for the application in cryptography, namely 
randomized reductions using polynomial-time probabilistic reduction functions. 
In cryptography, these results guarantee the hardness of “average” cases. That 
is, given an average instance according to a probability distribution, it will most 
likely be intractable. 


7 Time Complexity 


As stated in Sect. 2, time complexity of the above reduction functions has not 
been formalized. However, we give a short explanation why all reduction func- 
tions are indeed in polynomial time. 


Subset Sum to CVP: The reduction function as given in Eq. (1) creates 
(n+ 2)(n +1) +1 values using only memory access or one addition. Therefore, 
the time complexity in this case is O(n). 


Partition to BHLE: In this case, the reduction function maps the input a of 
length n to b as defined in Eq. (7). The value k = 1 is fixed. Then a is mapped 
to a vector of length 5n — 1. When calculating the b;, we need to calculate the 
value of M as in (4). As we sum over all input values, this lies in O(n). Each 
b; can then be calculated in O(n) since it only contains a constant number of 
additions of the input with fixed cofactors (see (5)—(6)). Putting the construction 
of the list and the calculation of the b; together, we find that the whole reduction 
function is in O(n?). 


BHLE to the SVP: Consider the reduction function as given in Eq. (5) using 
the value ki’ as in (9). Calculating k requires n + 2 memory accesses which 
are processed in n + 4 arithmetic operations, thus having a time complexity of 
O(n). Every other entry in the matrix is calculated on O(1), since they contain 
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at most two memory accesses and at most two arithmetic operations. The input 
generates (n +1)? + 1 values, of which (n+ 1)(n+1) are in O(1) (namely all the 
zeros and ones, the vector (k+1)-a and the constraint k) and one is calculated 
in O(n) (namely k4). Thus, the whole reduction function lies in O(n?). 


8 Outlook 


With this paper, we now have a formal proof for NP-hardness of the CVP and 
SVP in the infinity norm, as well as a formal proof of the CVP in p-norm (for 
1 < p < ov). In the formalization process, many gaps and imprecisions in the 
pen-and-paper proofs were fixed. The changes to the original proofs have been 
elaborated with explanations and examples. Unfortunately, giving a determin- 
istic reduction proof of the SVP in p norm for p < oo is still an open prob- 
lem. Under probabilistic assumptions, Ajtai showed NP-hardness of the SVP in 
Euclidean norm in [2]. 

An interesting topic for future work is to develop a framework for probabilistic 
reductions such as in [2]. This will give the foundation to extend formalization 
of hardness proofs to other problems in lattice theory, especially those used in 
lattice-based cryptography, such as the Learning with Errors (LWE) Problem, 
Ring-LWE and Module-LWE. This will underline the security of many lattice- 
based crypto systems. Another topic for future work is to formalize the hardness 
proofs for approximate versions of the CVP and SVP. 
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Abstract. We are interested in widening the reasoning support for 
propositional modal logics in the so-called modal cube. The modal cube 
consists of extensions of the basic modal logic K with an arbitrary com- 
bination of the modal axioms B, D, T, 4 and 5. We revisit recently devel- 
oped local reductions from all logics in the modal cube to a normal form 
comprising sets of clausal formulae with associated modal levels. We 
extend these reductions further to the basic modal logic K, called defini- 
tional reductions. This enables any prover for K to be used to solve the 
satisfiability problem for all logics in the modal cube. We also present 
alternative, axiomatic, reductions based on ideas originally proposed by 
Kracht, providing new theoretical results and improved bounds on the 
size of the reductions. We compare both sets of reductions combined with 
state-of-the-art provers for K on a large set of parametric benchmarks 
for all logics in the modal cube. The results show that the provers per- 
form better with reductions based on the clausal normal form than the 
axiomatic reductions. 


1 Introduction 


Following [4], modal logics can be seen as simple but expressive languages for 
talking about relational structures that provide an internal and local perspective 
on those structures. The most intensively studied modal logics are the basic 
modal logic K and its extensions with one or more of the axioms B (symmetry), 
D (seriality), T (reflexivity), 4 (transitivity) and 5 (Euclideaness), that form 
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the so-called modal cube. There are numerous reasons for this. To name just 
three: (i) relations which are serial, symmetric, transitive, etc. are very common; 
(ii) the logics in the modal cube can be used to represent and reason about 
idealised mental attitudes such as knowledge, belief, desire and intention; (iii) 
mathematical techniques, algorithms, calculi, as well as implemented reasoning 
tools for these logics provide building blocks for the study and application of 
more complex modal logics. 

In [27], we have presented a reduction from each of the 15 distinct logics in 
the modal cube to Separated Normal Form with Sets of Modal Levels, SNF gn7, a 
clausal normal form for basic modal logic in which clauses are labelled with pos- 
sibly infinite sets of modal levels, and to Separated Normal Form with Modal 
Levels, SNF,,7, where each clause is given a natural number label. The latter 
reduction then allowed us to use the modal-layered clausal resolution (MLR) 
calculus [22], implemented in the modal logic theorem prover KgP [19,26] to 
reason in these logics. We evaluated this approach on a new collection of bench- 
mark formulae for all 15 logics and compared its performance with that of the 
global modal resolution (GMR) calculus also implemented in KgP and with Leo- 
III, an automated theorem prover for polymorphic higher-order logic [32]. The 
GMR calculus has specific rules for each logic while Leo-III reasons about modal 
logics using a translation approach and has translations for each of the 15 logics 
built in. The evaluation showed that the approach performs better than Leo-III 
but not as well as the GMR calculus in KgP. We identified the reduction from 
SNFsmı to SNF,,; as the main contributing factor, in particular, on satisfiable 
formulae where the MLR calculus has to fully saturate the corresponding set of 
SNFm; clauses up to redundancy before it can conclude that the original formula 
is satisfiable. 

In this paper, we investigate and evaluate an alternative use of our reductions 
from logics in the modal cube to SNFmı. A finite set of clauses in SNF; can 
straightforwardly be transformed into a formula in the basic modal logic K. Such 
a transformation then allows the use of any existing approach to solving the 
satisfiability problem in K to the satisfiability problem in all logics in the modal 
cube. An advantage of the use of this transformation over a translation from each 
of the 15 logics to first-order (or higher-order) logic [1,5, 9,14] is the availability of 
implemented decision procedures for basic modal logic. In contrast, while many 
decidable fragments of first-order logics are known, including decidable fragments 
that are suitable targets of translations of modal logic formulae, implemented 
decision procedures for these fragments are rare. See also related discussions in 
[27,30]. 

The original motivation for our work on reductions to SNFsmı and SNFm: 
were Kracht’s reductions of the normal modal logics KB, KD, KT, and K4 to K 
[17,18]. Extending our reduction from SNF; to K to obtain a reduction from 
the modal cube to K raises first the question whether one can devise a reduction 
based on the same idea as Kracht’s for the remaining logics of the modal cube. 
We will call such a reduction axiomatic as the idea is to use certain instances 
of axiom schemata embedded into modal contexts of nested O-operators up 
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to a certain depth bound. We answer this question positively by providing the 
reductions missing in Kracht’s work. The second question then raised is how well 
provers for K perform on our reduction compared to an axiomatic reduction. Our 
empirical evaluation indicates that the definitional reduction appears to result 
in better performance overall when combined with state-of-the-art K provers. 
The structure of the paper is as follows. In Sect.2 we recall common con- 
cepts of propositional modal logics and the definition of our normal form SNF mz. 
Section 3 recalls our reduction from logics in the modal cube to SNF, defines 
the transformation of a finite set of SNF,,; clauses to basic modal, and intro- 
duces the axiomatic reduction for the logics in the modal cube. In Sect.4 we 
compare the performance of a combination of the reductions defined in Sect. 3 
when combined with provers for basic modal logic as well as with the global 
resolution calculus for logics in the modal cube implemented in KgP. 


2 Preliminaries 


The language of modal logic is an extension of the language of propositional 
logic with unary modal operators O and ©. More precisely, given a denumerable 
set of propositional symbols, P = {p, po, q, qo, t, to,...} as well as propositional 
constants true and false, modal formulae are inductively defined as follows: 
constants and propositional symbols are modal formulae. If p and y are modal 
formulae, then so are ~g, (pAW), (Pvy), (p > y), Dy, and Oy. We also assume 
that A, and V are associative and commutative operators and consider, e.g., 
(pV (qVr)) and (r V(qV p)) to be identical formulae. We often omit parentheses 
if this does not cause confusion. The size of y is the number of occurrences 
of propositional constants, propositional variable, boolean operators and modal 
operators in y. By var() we denote the set of all propositional symbols occurring 
in y. This function easily extends to finite sets of modal formulae. A modal axiom 
(schema) is a modal formula w representing the set of all instances of y. 

A literal is either a propositional symbol or its negation; the set of literals is 
denoted by Lp. By ~l we denote the complement of the literal l € Lp, that is, if 
lis the propositional symbol p then ~l denotes ~p, and if l is the literal ~p then 
—l denotes p. By |l] for L € Lp we denote p if l = p or l = ~p. A modal literal is 
either Ol or Ol, where l € Lp. 

An occurrence of a subformula has positive polarity if it is inside the scope of 
an even number of (explicit or implicit) negations, and it has negative polarity 
if it is one inside the scope of an odd number of negations. A literal is pure if all 
its occurrences have either a positive or a negative polarity. 

The modal logic K is given by the smallest set of modal formulae which 
includes all propositional tautologies, the axiom schema O(y > 7) — (Oy > 
w), is closed under modus ponens and the rule of necessitation (if y € K 
then Oy € K). Given a modal logic L and set of axioms X, the smallest modal 
logic L’ > LU X is an extension of L and we denote L’ by LX. 

The standard semantics of modal logics is the Kripke semantics or possible 
world semantics. A Kripke frame F is an ordered pair (W, R) where W is a non- 
empty set of worlds and R is a binary (accessibility) relation over W. A Kripke 
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structure M over P is an ordered pair (F, V} where F is a Kripke frame and the 
valuation V is a function mapping each propositional symbol in P to a subset 
V(p) of W. A rooted Kripke structure is an ordered pair (M, wọ) with wo E€ W. 

Satisfaction (or truth) of a formula at a world w of a Kripke structure M = 
(W, R, V} is inductively defined by: 


(M,w) } true; (M,w) - false; 

(M,w) =p iff w € V(p), where p € P; 

(Mujy iff (M, w) F ¢: 

(M,w) |= (WAY) iff (M,w) E p and (M, w) H Y 

(M, w) = (pV) iff (M, w) = y or (M, w) F Y; 

(M, w) = (y — 4) iff (M, w) F ~y or (M, w) = y; 

(M, w) = Op iff for every v, w Rv implies (M, v) H ¢; 
(M, w) = Oy iff there is v, w Rv and (M, v) Fy. 


If (M, w) = ¢ then we say that y is true at w in M. A rooted Kripke structure 
M = (M, wo) is a model of a modal formula ¢ iff (M, wo) = y and M satisfies y. 
A modal formula is satisfiable iff there exists a Kripke structure M and a world 
w E€ M such that (M, w) = y. A rooted Kripke structure M = (W, R, V, wo) is a 
rooted tree Kripke structure iff R is a tree, that is, a directed acyclic connected 
graph where each node has at most one predecessor, with root wo. 

A path from wo to w, of length k, k > 0, in a frame F = (W, R) is a sequence 
(wo, W1, ---, Wp) where for every i, 0 < i < k— 1, w; R wi}. A path (wo) of 
length 0 is identified with its root wģ. In a rooted tree Kripke structure M with 
root wo for every world wọ € W there is exactly one path connecting wo and 
wr; the modal level (in M), denoted by mlm(wp), is given by the length of the 
path from wo to wz. More generally, for a rooted Kripke structure M with root 
wo, the depth of a world wp (in M), denoted by depth (wx), is the length of the 
shortest path from wo to wz. The depth of M is the maximal depth of a world 
in M. The outdegree of a world w in F is given by |{w’ | w Rw}. 

The 15 logics in the modal cube consist of K itself and its extensions with 
one or more of the modal axioms shown in Table 1. Each of these axioms defines 
a class of Kripke frames where the accessibility relation R satisfies the first- 
order property stated in the table. Combinations X of axioms then define a class 
&» of Kripke frames where the accessibility relation satisfies the combination 
of their corresponding properties. Given a logic L = KY, a modal formula ¢ is 


Table 1. Modal axioms and relational frame properties 
Name | Axiom Frame Property 
D poop |Vudw.u Rw Serial 
T yoy Vu.w Rw Reflexive 
B yoy |Www.v Rw —w Rv Symmetric 
4 po p | Yuvw.(u Rv ^v Rw) —> u Rw | Transitive 
5 Sy > Oy | Yuvw.(u Rv ^A u Rw) —> v R w | Euclidean 
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Table 2. Rewriting Rules for Simplification 


ypvnp>y p A 7p => false Op V Ong => Otrue 
ypVvep>y pV ay => true yA Ong false 
true > false yA true > p Ofalse \ Oy => false 
false > true y A false > false Otrue V Oy => true 
“4p > p y V false > y yp ^ Ony => false 
true > true y V true > true false \ Oy false 
Ofalse => false Otrue V Oy > Otrue 


L-satisfiable iff there exists a frame F € s, a valuation V and a world w € F 
such that M = (F, V, w) = 9 and we call M an L-model of ọ. 

A modal formula is in simplified NNF (denoted by nnf(w)), if it has been sim- 
plified by exhaustively applying the rewrite rules in Table 2, and it is in Negation 
Normal Form (NNF), that is, a formula where only propositional symbols are 
allowed in the scope of negations. 

The reductions given in the next section produce formulae in a clausal normal 
form, called Separated Normal Form with Sets of Modal Levels SNF sm, given in 
[29]. The language of SNFsmı extends that of the basic modal logic K with sets 
of modal levels as labels. Clauses in SNFsmı have one of the following forms: 


a see: S: — ol S:V’30l 
(literal clause) (positive modal clause) (negative modal clause) 


where S C N and l, l’, l; are propositional literals with 1 < į < n, n € N. We 
write x : y instead of N : y and such clauses are called global clauses. Positive 
and negative modal clauses are together known as modal clauses. 

Given a rooted tree Kripke structure M and a set S of natural numbers, 
by M[S] we denote the set of worlds that are at a modal level in S, that is, 
M[S] = {w € W | mim(w) € S}. Then 


MES: y iff (M, w) — ¢ for every world w € M[S]. 


The use of sets as labels allows a concise representation of clauses that might 
hold in a possibly infinite number of levels. 

EM ES: y, then we say that S: p holds in M or is true in M. For a set 
® of labelled formulae, M H @ iff MES: ọ for every S : y in @, and we say & 
is K-satisfiable. 

We introduce some notation that will be used in the following. For m,n € N, 
m < n, let [m..n] = {m,...,n} CN. Let St = {1+ 1EN|l eS}, S = 
{l-1eEN|le S$}, and $2 = {In € N | n > min(S) > 1}, where min(S) is the 
least element in S. Note that the restriction of the elements being in N implies 
that S~ cannot contain negative numbers. 

A formula is in Separated Normal Form with Modal Levels (SNF m1) [22,23], 
if it is a conjunction of clauses in on of the following forms: 


ml : V; li ml: V > ol ml: l — ol 
(literal clause) (positive modal clause) (negative modal clause) 
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where ml € NU {x} and l, l’, l; are propositional literals with 1 < i < n, n € N. 
Effectively, this normal form corresponds to a restriction on the SNF... where 
the sets are singletons or x, representing all levels. 


3 Reductions 


3.1 Definitional Reduction 


In [27] we introduced a reduction pẹ™ (p) that for any modal logic L = KE 
with © C {B,D,T,4,5}, transforms a modal formula y in simplified NNF to 
a finite set oon of clauses in SNFsmı such that y is L-satisfiable iff pan is K- 
satisfiable. For K4, K5 and their extensions by further axioms, p$™! produces sets 
of clauses where the labelling sets S$ are potentially infinite. However, depending 
on syntactic properties of y it is possible to impose upper bounds on the maximal 
modal level that occurs in those sets so that the reduction remains satisfiability 
preserving. Table 3 shows such a bound for each logic in the modal cube. In the 
table and in the following, for a modal formula y in simplified NNF, (i) d¥, is the 
modal depth of ¢, (ii) dS is the maximal nesting of ©-operators not in the scope 
of any O operators in ¢, (iii) n§ is the number of O-subformulae in ọ, and (iv) 
n$ is the number of ©-subformulae below O-operators in y. Using these bounds 
it is then possible to define a function p?” that transforms a modal formula ¢ in 
simplified NNF to a finite set pml of clauses in SNF,,,; such that y is L-satisfiable 
iff DY! is K-satisfiable. 

Table 4 shows the definitions of modified reductions ro ia and pe to SNF mi 
and SNF,,1, respectively. In contrast to p$™, p$™ already uses the bounds in 
Table 3 to ensure that all labelling sets S occurring in the reduction of a modal 
formula remain finite. The function ø?! then does not enforce further restric- 
tions, but straightforwardly transforms a finite set of SNFsmı-clauses with finite 
labelling sets into a finite set of SNF,,,; clauses. This presentation of the reduction 
of modal formulae to a finite set of clauses in SNF,,,; is closer to the implemen- 
tation of the process in the prover KsP. 

Given a finite set ® of clauses in SNF,,; we can use a function 7' to obtain 
an equivalent modal formula as follows: 


rf(8) = A{a™C | ml: C € 8}. 


where O°% = y and O"t!y = OO". 


Table 3. Bounds on the maximal modal level in SNFsmı clauses 


Logic L Bound d?™! (4) 

K, KD, KT, KB, KDB, KTB | d?, 

K4,S4 1+d§+n8 x né 

KD4 1+d8 + (max(1,n$) x né) 
KB4, K5, S5, K45 1+d% +n8 

KD5, KD45 1+d§ + max(1,n&) 
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PECE) = {ml : Y |S: € p(o) and ml € 3} 
PE” (P) = {{0} : to} U pr ({0} : ty > 9) 
where d = dẸ™! (p) as per Table 3 and pẹẸ is defined as follows: 
pi(S:t— true) = 0 
pł (S : t > false) = {S : at} 
PILS : t => (Y1 Ad2)) = {5 : =t V n(Y), S : =t V n(2)} USES, p1) U SECS, p2) 
pi(S:t Y) ={5:tV y} 
if ~ is a disjunction of literals 


pL(S : t > (tr V2) = {5 : =t V (1) V m(w2)} U ECS, w1) U SECS, 2) 
if %1 V we is not a disjunction of literals 


pt(S:t— Ov) = {S : t > On(h)} U LCST, w) 
pr (S:t > Op) = PE(S : t > OW) U ôL (167 (8), Y) 


where 7 and ôf are defined as follows: 


_ Jẹ, if isa literal 5d JÓ, if w is a literal 
nb) = a otherwise 1(5, 0) = pi(S:ty > Y), otherwise 


and functions P#, 1P@ and 15% are defined as follows: 


L P#(S : toy > OW) LP#(S) 184 (S) 
K S : toy > On(w) S St [0..d] 
KB |S: toy > On(w), S (ST USt) 
S— : n(Y) V tontoy, S- : tostoy > Ortoy Qo. .d] 
K4 || SŽ n[o0..d] : toy > On(w), SŽ s+) 
SŽ A [0..d] : toy > Otoy N[0..d] N[0. .d] 
K5 0. .d] : toy —> On (4), 0..d] [0..d 
0..d|: “totoy V toy, [0. .d] : totoy => Otay; 
0..d]: totoy — O-toy, (0. . d] ; totoy — totuy 
KB4 ||[0..d] : toy > On(w), 0..d] [0..d 
0. . d] : n(Y) V tovtg,,  [0..d] : toy V tanta, 
0. .d k to-toy = “toy, [0. .d] H toy =+ toy 
K45 | [0..d] : toy > On(w), {0} : toy > Otoy iff 0 € S,|[0..d] [0..d 
0..d|: “totoy V toy, [0. .d] : totoy => Otay; 
0. .d] : totoy =? “toy, [0. .d] š totoy a i totuy 
KDX|{IPf5(9) : toy > On(Y)} U Pk (S : tay > Oy) = ld (S) 
KTS ||{IPés(S) : stow V n(Y)} U Pés(S : toy > On) = léés(S)US 


Table 4. p?’'- and pyt'-reductions of modal formulae to SNF,,,; and SNF, respec- 
tively, X C {B, 4, 5}. 


Buy One Get 14 Free: Evaluating Local Reductions for Modal Logic 389 


A smaller equivalent formula can be constructed as follows. For a finite set 
@ of clauses in SNF,,; let S[ml] = {C | ml: C € ©} and mMlmaz = max{ml | ml: 
C € &}. Then 

7G) = A D0] a O(A SE A (A Bi A A DCN F[rlmac])-**))- 0) 


ae ay : d 
Combining J?! and T” we can define a reduction p as as 


pn! (~) = 7°(0E'(¢)) 
which we call the definitional reduction of p for the modal logic L. 


Theorem 1 ((30]). Let L = KX with X C {B,D,T,4,5} and ~ be a modal 
formula in simplified NNF. Then ọ is L-satisfiable iff ptt (o) is K-satisfiable. 


This reduction allows us to use any reasoner for the basic modal logic K as a 
reasoner for all the logics in the modal cube. 


3.2 Axiomatic Reduction 


The reductions pẹ™! and p?” in [27] were developed as an alternative to and 
improvement on reductions from the modal logics KB, KD, KT, and K4 to K 
introduced by Kracht [18]. In contrast to p$™ and pẹ?! which require modal for- 
mulae to be in NNF and treat the modal operators O and © differently, Kracht’s 
reductions assumes that (i) modal formulae are not necessarily in NNF and (ii) 
the only modal operator occurring in modal formulae is O and no distinction is 
made between positive and negative occurrences of this operator. In the follow- 
ing we extend Kracht’s reduction to all logics in the modal cube while adhering 
to those two assumptions. 

Let EEY = y and ESH = (Y A OES”). We can then define a reduction 
p¢” for all modal logics L in the modal cube as follows: 


pt? (v) = ep ABs N PECO) (2) 


Table 5. Pz*-reduction of O-formulae, X C {B, 4, 5}. 


L (PE (p) bE (9p) 
K {true} dh, 

KB | {(- > O04 | Ov € silg} d, 

K4 | {Oy — God | Oy € sF(y)} në 
K5 | {0-04 > Oy, -0704 > 00%, (oy > 1 

Y) | Ov € sf(y)} 

KB4 | {Ow V 0-04 | Ow € sf(y)} U Pea(y) U Pš (9) 0 

K45 | Palp) U Pes(y 0 
KDY | {=O false} U P#(y) bike (Y) 


KTX | {Op — 4% | Ov € sf(y)} U PH (y) bks (p) 
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where b% (p) and Pf" (vy) are as defined in Table 5. We call p% (p) the axiomatic 
reduction of p for the modal logic L. 


Theorem 2 Let L = KX with © C {B,D,T,4,5} and y be a modal formula 
in simplified NNF. Then p is L-satisfiable iff p?” (p) is K-satisfiable. 


Just as the definitional reduction, the axiomatic reduction allows us to use any 
reasoner for basic modal logic as a reasoner for all the logics in the modal cube. 


3.3 Discussion 


There are five main differences between the definitional reduction and the 
axiomatic reduction, and between the axiomatic reduction and the work in [18]: 


1. The axiomatic reduction for all logics except the logics KB, KD, KT, K4 is 

new. Kracht [18] did define a reduction from K5 to K4, but since K5 is not 
a subset of K4, this reduction is not correct. Our definition of the axiomatic 
reduction corrects that mistake while remaining close to the Kracht’s original 
idea by adding instances of 4 at modal levels greater than 0. 
The bounds given for KB, KD, and KT given in Table5 are the same as 
Kracht’s [18]. However, for K4 he used a bound given by the number of distinct 
subformulae of the formula y under consideration. We are able to show that 
a bound given by the number of distinct O-subformulae is sufficient. For the 
remaining logics, the bounds are new. 

2. The definitional reduction introduces new propositional symbols for complex 
subformulae, so-called surrogate propositional symbols. For the modal res- 
olution calculi implemented in KgP [22,26] this is necessary to obtain the 
clausal normal form on which the calculi operate. However, in the context 
of our reductions, where we have to add instances of axiom schemata for 
subformulae, the use of surrogate propositional symbols offers the rae 
that repeated occurrences of the same complex subformula can be replaced 
by the same surrogate symbol. Each surrogate propositional symbol then 
requires a definition at every modal level at which it occurs, but overall there 
should still be a benefit in relation to the size of the resulting formula. 

3. The bounds shown in Table3 for the definitional reduction and in Table5 
for the axiomatic reduction, have different effects on the modal formulae 
produced. For the definitional reduction, the modal depth of pe (p) is at 
most d™! (pz) + 1, that is, the bound shown for L in Table3 plus one. In 
contrast, for the axiomatic reduction, bf” in Table5 only states the modal 
depth of E Sor Pa Where the propositional symbol p, will then be replaced by 
a conjunction of instances of axiom schemata for O-subformulae of y. For all 
logics except K and KD, the modal depth of these axiom schemata will be 
between d£, and d¥, + 2. Thus, the overall modal depth of p$*(y) is bound 
by b9” + d¥ + 2, not just by the bound shown in Table 5. 

For example, consider the formula p in KB. Then with the axiomatic 
reduction we obtain the formula 


p ^ @S?((ap > O-Op) A (sOp > O-0Op)) 
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which itself is a formula of modal depth 5. With the definitional reduction, 
we obtain 


toop A (toop > Otop) A O(top > Op) A (p V to~top) A (taste, > O-top) 


which is a formula of modal depth 2. 

Taking this into account we can see that for L = KX where X C {B,D, T} 
we can expect the modal depth of pit (p) to be less than or equal to that of 
p?*(~), while for the remaining logics of the modal cube it depends on the 
individual formula which reduction will produce a formula of greater modal 
depth. Nevertheless, for logics such as K4 we expect that the modal depth of 
p% (yp) will often be drastically lower than that of p4% (y). 

4. The definitional reduction makes a distinction between O- and ©-operators 
and only introduces additional clauses for O-subformulae. For logics except 
KB4, K5 and their extensions, it also carefully tracks at which modal levels 
additional clauses are required for which occurrences of surrogate symbols 
that were introduced for O-subformulae. The ‘price’ paid for the fact that 
for these logics additional clauses are not also introduced for ©-subformulae 
is in the higher bounds for the modal levels up to which additional clauses 
and definitions of surrogate symbols need to be added. The reason is that 
the presence of axiom instances for negative occurrences of O-subformulae 
in the axiomatic reduction for K4, K5 and their extensions allows the ‘back- 
propagation’ of O-subformulae that occur negatively, namely, if ~Ow is true 
at a world w at modal level 2 or higher in a tree K-model of p% (p), then it 
is also true at a predecessor world v of w. Provers that do not construct tree 
Kripke structures, but general Kripke structure, or use caching, can poten- 
tially take advantage of this and construct ‘shallower’ models. On the hand, 
the outdegree of worlds increases. 

5. The definitional reduction for K45, KD45 and KT45 takes account of the fact 
that instances of 4 are only required to hold at the root world. At all other 
worlds, instances of 5 are already sufficient to enforce transitivity of the acces- 
sibility relation in Kripke structures for these logics. This restriction to the root 
world is in line with the construction of the definitional reduction pit in Eq. 1, 
namely, that we have different sets of clauses associated with each modal level. 
In contrast, the construction of the axiomatic reduction p9” in Eq. 2 assumes 
that we use the same set of axiom instances at every modal level. 


We will revisit the effect that Points 2, 3, and 4 have on the size and modal 
depths of formulae, on the performance of provers, and the models they may 
produce in the next section. 


4 Evaluation 


In our evaluation we compare the effect of using the definitional reduction and 
the axiomatic reduction as input for three provers for K: CEGARBox [10], Sparta- 
cus [13], and KgP [24,30]. Spartacus and CEGARBox were included as they pre- 
sented best performance in recent evaluations [10,24—26,29,30] when compared 
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with several other provers with built-in support for modal logics: BDDTab [12], 
FaCT++ [34], InkKreSAT [16], SPASS [33], and Leo-IJI+E [8,31]. 

We have included two more approaches in the comparison: (i) the global 
modal resolution (GMR) calculi [21] that include specific inference rules for each 
of the logics in the modal cube, implemented in Kg; (ii) modal layered reso- 
lution (MLR) calculi [22] together with the reductions given in Table 4, again 
implemented in KsP. The first is an example of ‘native’ reasoning in the log- 
ics concerned, while the inclusion of latter allows us to investigate the effect of 
‘internalising’ the reduction and having inference rules that operate on modal 
clauses. Both calculi support several refinements of resolution. We report only 
results for the ordered refinement (cord) as it was the best performing overall. 

The two reductions combined with CEGARBox, Spartacus, and KgP and the 
GMR and MLR calculi in KgP give us a total of eight different approaches. 

We have used the benchmarks introduced in [27], which comprise! (i) 100 
unsatisfiable formulae for each of the logics being considered; these are based on 
20 formulae each from 5 classes of the LWB benchmark collection [3] modified 
so that the formulae for logic L are only unsatisfiable in L and its extensions; 
and also (ii) 100 formulae that are S5-satisfiable, that is, formulae that are 
satisfiable in all 15 logics; these consist of 20 formulae each from 5 classes of the 
LWB benchmark collection. 

We have supplied all reductions and provers with preprocessed formulae 
extracted from KsgP. The simplified negation normal form for a formula y, 
nnf(~), is generated by KgP as follows. First, the formula is rewritten into box 
normal form [28], a normal form similar to the negation normal form, but where 
the operator © is rewritten as =O-. To the resulting formula, we apply prenex- 
ing [20], that is, moving the modal operators outwards as much as possible. The 
simplification rules given in Table2 are then applied together with pure literal 
elimination (i.e. replacing occurrences of pure literals by true) and constant 
propagation. Table 6 shows the effect of all these preprocessing steps on aver- 
age size, average modal depth, and average number of boxes in our benchmark 
formulae, separately for unsatisfiable (U) and satisfiable (S) formulae. Over all 
formulae we get a 20% reduction in size and a 66% reduction in the number of 
-operators. The modal depth remains unchanged which is an indication of the 
robustness of the benchmarks. 

For the axiomatic reduction, the resulting formula is then extracted from KgP 
and the reduction according to Eq.2 and Table 5 is applied externally. For the 


Table 6. Effect of preprocessing on benchmark formulae 


Sat | Original Formulae Simplified Formulae 

Avg Size | Avg Mod. Depth | Avg #Boxes | Avg Size | Avg Mod. Depth | Avg #Boxes 
U | 17931 16 405| 15549 16 | 241 
S | 3641 48 719| 1979 48 | 146 


1 Input files for the provers used here and the source for KgP are available at http:// 
nalon.org/#software. 
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definitional reduction, the formula is not extracted but transformed by KgP into 
SNFm; according to Tables 3 and 4. During the transformation into the normal 
form, complex subformulae are replaced by the same symbol in all positions they 
might occur. After transformation into SNFmı, the kept clauses are extracted 
from KsP and used to produce the modal formula for the definitional reduction 
according to Eq. 1. 

Table 7 shows experimental results comparing the performance of the eight 
approaches. The first three columns of the table show the logic, the satisfiability 
status of the formulae for our benchmark collection used for this logic (‘U’ for 
‘unsatisfiable, ‘S’ for ‘satisfiable’), and their number. In total we have 30 sets 
of benchmark formulae. The next eight columns then show how many of those 
formulae were solved by each of the eight approaches. A time limit of 100 CPU 
seconds was set for each formula and where a reduction is used the time taken 
includes the computation of the reduction. The highest number or numbers in 
each row are highlighted in bold. The last six columns show the results for a 
and p¢” combined with CEGARBox, Spartacus, and KsP. Here, for each logic L and 
each satisfiability status we have indicated with italics which reduction resulted 
in better performance for each of the three provers. In the following we call each 
such pair a comparison point. Benchmarking was performed on a PC with an 
AMD Ryzen 5 5600X CPU @ 4.60 GHz max and 64 GB main memory using 
Fedora release 37 as operating system. 

For both satisfiable and unsatisfiable benchmark formulae, the combination 
of the definitional reduction with CEGARBox performs best. Overall, it solves 
25% more formulae than the second best approach, the GMR calculi in KgP. 
CEGARBox with the definitional reduction also outperforms CEGARBox with the 
axiomatic reduction on both satisfiable and unsatisfiable benchmark formulae. 
The same is true for the MLR calculus in KgP when combined with one of 
the two reductions and for Spartacus on satisfiable benchmark formulae when 
combined with one of the two reductions. 

We can see that the internal transformation to SNF; together with the MLR 
calculus in KsP performs better than first computing the definitional reduction 
gs and then handing the resulting formula to KgP. The former approach per- 
forms better on 26 out of 30 sets of benchmark formulae. This is not surprising 
since in the latter case KgP does apply the transformation into SNF; again. This 
implies that new propositional symbols are introduced when applying renaming 
and new clauses are added defining those symbols. Also, for the ordered res- 
olution refinement we use, all literals in the scope of modal operators will be 
renamed in order to retain completeness [22]. Again, for each renamed literal 
there will be an additional clause. Overall, KgP will perform inferences with a 
larger set of SNF,,,; clauses over a larger set of propositional symbols. This is 
bound to degrade performance in most cases. 

Looking at individual logics, a more varied picture is evident. Consider both 
satisfiable and unsatisfiable benchmark formulae for the logics K5, KD5, K4B 
(which is the same logic as K5B), K45, KD45, and S5 and the behaviour of 
Spartacus and KsP with one of the two reductions on these. Of these 24 com- 
parison points, the axiomatic reduction results in better performance on 21 and 
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Table 7. Performance of KX provers, pe combined with K provers and pf” combined 
with K provers 


L S Total | KgP KgP CEGARBox | Spartacus KeP (MLR) 
(GMR) (MLR) 

er! | ek o| ef or | ot 
K S 100 85 100; 100/ 100/100) 100) 100 100 
KD S 100 85 100; 100 62| 92| 24 100 51 
KT S 100 81 68| 100 63) 65| 43) 65 38 
KB S 100 58 64| 100 64/100) 46 65 36 
K4 S 100 85 58 65 50} 56|) 47) 50 18 
K5 S 100 60 38 88| 94| 45| 73) 22 27 
KDB |S 100 70 73| 100 30) 82} 12) 65 30 
KTB |S 100 60 57| 100 49| 66| 16) 56 31 
KD4 |S 100 85 54 57 46| 26| 19) 48 18 
KD5 |S 100 70 47 86 78| 18} 50, 32 27 
K45 |S 100 53 38 88} 90} 45| 83) 22 37 
KB4 |S 100 19 38 94 84} 63} 90) 34 59 
KD45 |S 100 66 47 86 84) 18) 61) 32 34 
S4 S 100 76 44 59 41) 36} 19, 37 15 
S5 S 100 57 42 84 81| 37| 65) 20 39 
All S 1500 1010 868 | 1307 | 1016 | 849| 748 | 748 560 
L S Total | KgP KgP CEGARBox | Spartacus KgP (MLR) 


(GMR) (MLR) 


Bee pe Oe oE ee I ee 
K U 100 76 73| 90 90 76 88 82| 83 
kD U 100 76 753| 89) 73 73) 49 738| 3 
KT U 100 78 76| 89) 80) 70) 44 71| 31 
KB U 100 79 52| 82| 66| 37| 32 49| 46 
k4 U 100 53 30! 57| 54| 30) 27 u| 26 
K5 uU 100 46 32) 82| 57|) 7| 60 26| 27 
KDB |U 100 78 53| 82 40| 38| 5 48| 23 
KTB U 100 77 50) 84 43| 52| 17 47| 17 
KD4 U 100 59 35| 51 35) 3 4 8) 14 
KD5 U 100 46 45| 77 59| 5| 6 3| 10 
k45 U 100 40 141 58) 53) 2| 49 7) 2% 
KB4 |U 100 52 32| 87| 64| 56) 72 33| 39 
KD45 U 100 43 29! 57. 53| 2 49 15) 12 
S4 |U 100 68 23| 55) 48| 32| 17 14 9 
s5 |U 100 44 26| 86) 50) 3| 58 7 9 
Aal lU 1500. 915. 650 1126| 865| 486| 632| 520! 407 


the definitional reduction only on 3. In particular, Spartacus with the axiomatic 
reduction consistently shows better performance for these logics than with the 
definitional reduction. In stark contrast, CEGARBox with the definitional reduc- 


Buy One Get 14 Free: Evaluating Local Reductions for Modal Logic 395 


tion still performs better on 10 out of 12 comparison points. Interestingly, this 
advantage of the axiomatic reduction does not carry over to K4 and its exten- 
sions KD4 and S4. Here, with exceptions of 3 out of 18 comparison points, the 
definitional reduction with one of CEGARBox, KgP, and Spartacus leads to better 
performance than the axiomatic reduction. 


Table 8. Comparison of axiomatic and definitional reduction combined with Spartacus 
on satisfiable benchmark formulae. 


Logic | Reduction | Solved | Solved by both | Formulae Models 

Avg size | Avg modal depth | Avg num of worlds | Avg num of edges | Avg depth 
K pul 100 100 5022 48 16 21 7 
K p” 100 100 1722 48 16 21 7 
KD ptt 92 24 881 4 16 22 4 
KD p% 24 24) 13178 9 224 7870 5 
KT ptt 65 43 6746 8 43 1164 4 
KT p” 43 43| 54700 17 272 5967 9 
KB pit 100 45| 37570 32 2 2 0 
KB p™ 46 45| 1552250 66 143 1037 1 
K4 pul 56 34| 339748 266 15 48 4 
K4 p% 47 34] 501328 60 241 5322 9 
K5 pit 45 44) 342049 91 102 817 2 
K5 p“ 73 44 166858 65 19 58 1 
KDB |p% 82 12 469 3 9 20 2 
KDB |p” 12 12 3637 7 255 3513 4 
KTB |p% 66 16 805 4 14 32 3 
KTB |p” 16 16 4277 8 693 4109 5 
KD4 |p% 26 19) 20729 35 267 3073 34 
KD4 p” 19 19| 18457 17 362 6676 11 
KD5 |p% 18 18 4670 8 246 4112 7 
KD5 p” 50 18 3784 7 39 334 3 
K45 |p% 45 44| 342095 91 101 810 2 
K45 sp” 83 44| 111403 64 15 86 1 
KB4 |p% 63 62| 257823 76 36 55 6 
KB4 |p” 90 62] 6546 50 60 709 5 
KD45 |p% 18 18 4689 8 241 4035 7 
KD45 |p% 61 18 2383 6 36 88 3 
S4 put 36 19| 2505 40 145 719 22 
S4 p” 19 19| 23790 18 300 4723 12 
s5 p“ 37 34) 19284 17 205 2968 15 
s5 p” 65 34 954 10 195 1243 7 


We can gain additional insight by looking in more detail at the behaviour of 
provers. While this would be most beneficial for CEGARBox, this tool currently 
only outputs the satisfiability status of formulae but neither models nor proofs. 
Instead we turn to Spartacus which can output models for satisfiable formulae. 
Table 8 shows information on the input formulae that were given to Spartacus, 
resulting from one of our reductions, and the models that Spartacus produced. 
The first four columns show the logic, the reduction that was used, how many 
satisfiable benchmark formulae (out of 100) Spartacus was able to solve, and 
how many formulae it was able to solve with both reductions. The number in 
the fourth column is not necessarily the minimum of the two numbers in the 
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Table 9. Comparison of axiomatic and definitional reduction combined with KgP on 
unsatisfiable benchmark formulae. 


Logic | Reduction | Solved | Solved by both | Formulae Proof Search 

Avg size | Avg modal depth | Avg num Inferences | Avg proof size | Avg proof max level 
K pit 82 82 1386 17 26912 487 17 
K p” 83 82 933 17 20738 256 17 
KD pit 78 35 614 9 792 293 9 
KD p” 35 35 7311 18 295516 164 9 
KT pit 71 31 2045 9 9787 191 2 
KT p” 31 31 7145 18 101555 238 9 
KB ptt 49 46 2510 13 27219 749 7 
KB pe 46 46| 13882 27 279984 285 8 
K4 pT 11 11) 31716 133 121620 309 5 
K4 p% 26 11 7629 23 52571 160 5 
K5 pit 26 11 4687 17 139615 259 3 
K5 jZ 27 11 1552 6 78125 377 4 
KDB |p’ 48 21 1391 8 11045 327 3 
KDB |p% 23 21 6660 17 244031 252 7 
KTB |p’ 47 17 1575 7 3212 199 2 
KTB |p” 17 17 5875 16 247655 392 7 
KD4 pf 8 8| 82247 231 238114 350 6 
KD4 |p% 14 8| 11707 31 109937 275 6 
KD5 |p 35 10 8340 17 69268 157 3 
KD5 |p” 10 10 1926 8 200611 303 4 
K45 pf 7 7 7138 21 450621 372 4 
K45 |p” 26 7 1665 7 71733 873 5 
KB4 |p’ 33 24 7562 21 164844 226 4 
KB4 |p% 39 24 2724 11 174535 2189 7 
KD45 |p 15 5 8023 17 107212 555 6 
KD45 |p% 12 5 1405 7 88261 768 5 
S4 pT 14 9) 63723 215 122671 252 3 
S4 p” 9 9| 12102 28 255852 289 4 
S5 prt 7 4 5731 18 221199 210 4 
S5 p% 9 4 1301 7 98434 435 5 


third column for a particular logic. The next two columns contain the aver- 
age size and average modal depth of pat and pọ” where Spartacus solve both. 
Finally, the last three columns contain the average number of worlds, number 
of edges, and depth of the models for these formulae. Spartacus uses blocking, 
even for the modal logic K, and the models it produces are not trees but general 
graphs. A fine-grained analysis on the level of individual formulae shows that, 
with the exception of the logic KB4, it is generally the case that the reduc- 
tion that produces smaller formulae leads Spartacus to produce smaller models, 
and thereby also leads to more formulae being solved. Only for KB4 are there 
more instances where a larger formula resulting from a reduction, namely the 
definitional reduction, lead to smaller models. However, it is still the case that 
axiomatic reduction then allows more formulae to be solved for KB4. 

For unsatisfiable formulae we consider KsP. Table 9 shows information on the 
input formulae that were given to KsP, resulting from one of our reductions, and 
the proof search conducted by KgP. The first six columns correspond to those in 
Table 8. The final three columns contain the average number of inference steps 
KoP requires to find a proof, the average size of those proofs, and the average 
maximal modal level of a clause in those proofs. Again we see that the reduction 
that produces smaller formulae, with few exceptions, also leads KgP to find 
proofs in fewer inference steps and allows it to solve more formulae. 
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5 Conclusions 


The axiomatic and the definitional reductions from logics in the modal cube 
to basic modal logic that we have presented in this paper allow any decision 
procedure for basic modal logic to be used to solve the satisfiability problem in 
all 15 logics of the modal cube. This is of particular interest as over the last 25 
years, a range of decision procedures for basic modal logic have been implemented 
and improved [2,6,7, 10-13, 15,34] but only few implemented decision procedures 
for all logics of the modal cube exist. Our empirical results also indicate that 
such reductions are not only a theoretical possibility but are effective and effi- 
cient: the combination of the definitional reduction with CEGARBox is currently 
the best performing approach on our collection of benchmark formulae for the 
modal cube. There are a number of other contributing factors to the efficiency 
of the approach that are also beneficial outside the context of reductions. Pre- 
processing techniques such as simplification and prenexing can reduce the size 
and, in the context of modal logics, the number of modal operators in a modal 
formula. The use of surrogate propositional symbols and of a clausal normal 
form allows to again reduce the size and structural complexity of formulae. 

Despite the positive empirical results, we nevertheless hope that more provers 
that natively support all the logics of the modal cube will be implemented. At 
the moment our comparison is limited to our own resolution-based prover KgP. 
Support for modal logics except K in other provers is often limited to KD, KT, 
and S4. A wider range of provers for all logic in the modal cube would allow us to 
establish the robustness of our empirical results and possibly enable us to identify 
strength and weaknesses relative to native provers. It would be beneficial if such 
support for native reasoning in a logics of the modal cube would also include the 
provisions of proofs for unsatisfiable formulae and models for satisfiable formulae 
as well as some abstract measure of the computational effort expended in finding 
those. This is paramount for our ability to explain the behaviour of prover on 
our benchmarks. 

Finally, our collection of benchmark formulae requires further refinement. 
Some of the satisfiable formulae in that collection seem to allow rather small 
models and overall do not appear to be sufficiently challenging across all the 
logics. We will need to investigate whether this can be remedied simply by mov- 
ing to higher parameter values for these parameterised classes of formulae or 
whether completely new classes of formulae are required. 
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Abstract. We revisit AC completion for left-linear term rewrite systems 
where AC unification is avoided and the normal rewrite relation can be 
used in order to decide validity questions. To that end, we give a new 
correctness proof for finite runs and establish a simulation result between 
the two inference systems known from the literature. Furthermore, we 
show how left-linear AC completion can be simulated by general AC 
completion. In particular, this result allows us to switch from the former 
to the latter at any point during a completion process. Finally, we present 
experimental results for our implementation of left-linear AC completion 
in the tool accompll. 


Keywords: Completion -© AC axioms : Term rewriting 


1 Introduction 


Completion has been extensively studied since its introduction in the seminal 
paper by Knuth and Bendix [10]. One of the main limitations of the original 
formulation is its inability to deal with equations which cannot be oriented into 
a terminating rule such as the commutativity axiom. This shortcoming can be 
resolved by completion modulo an equational theory €. In the literature, there 
are two different approaches of achieving this. The general approach [3, 6] requires 
€-unification and allows us to decide validity problems using the rewrite relation 
—rR/e which is defined as >% - > - +. For left-linear term rewrite systems, 
however, there is Huet’s approach [5] which avoids €-unification and allows us 
to decide validity problems with the normal rewrite relation —R and a single 
check for €-equivalence of the computed normal forms. In their respective books, 
Avenhaus [1] and Bachmair [3] present inference systems for left-linear comple- 
tion modulo an equational theory. In this paper, we revisit slightly modified 
versions (A and B) of these inference systems for finite runs. In addition to a 
new correctness proof for A in the spirit of [4] which does not rely on proof 
orderings (Sect. 3), we reduce correctness of B to the correctness of A by estab- 
lishing a simulation result between finite runs in these systems (Sect. 4). For 
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the concrete equational theory of associative and commutative (AC) function 
symbols, we also show the connection between the inference system A and gen- 
eral AC completion by means of another simulation result (Sect. 5). Finally, we 
present experimental results obtained from our implementation of A for AC in 
the tool accompll which show that the avoidance of AC unification can result 
in significant performance improvements over general AC completion (Sects. 6 
and 7). 


2 Preliminaries 


We assume familiarity with term rewriting and completion as described e.g. in 
[2] but recall some central notions. We consider term rewriting systems (TRSs) 
which operate on terms over a given signature F. Terms which do not contain 
the same variable more than once are referred to as linear terms. We say that 
a TRS is left-linear if £ is a linear term for every rule 2 r € R. A TRS R 
is terminating if the associated rewrite relation —R is well-founded. In that 
case, we write s >h t if t is a normal form of s. A TRS R is confluent if 
different computation paths can always be joined, i.e., p— : >h C >R RE 
An important sufficient criterion for confluence is the well-known critical pair 
lemma which states that a terminating TRS is confluent if all non-trivial overlaps 
between left-hand sides of rules (critical pairs) are joinable. Furthermore, there is 
the notion of prime critical pairs [8] which further restricts the considered critical 
peaks t £— s > u to the ones where all proper subterms of s|p are irreducible. 
In particular, terminating TRSs whose prime critical pairs are joinable are also 
confluent. The set of (prime) critical pairs is denoted by CP(R) (PCP(R)). We 
define CP(R1, R2) as the set of all critical pairs stemming from local peaks of 
the form t R? s >R, u and CP™(Ri,R2) = CP(Ri, R2) UCP(R2, Ri). A TRS 
is complete if it is terminating and confluent. Hence, a complete presentation R 
of an equational system (ES) € can be used to decide the validity problem for 
E: s +>% t if and only if s >h : RO t. 

We now turn our attention to rewriting modulo AC function symbols. To that 
end, we start by giving general definitions for abstract rewrite systems (ARSs). 
Let A = (A, >) be an ARS and ~ an equivalence relation on A. We write = for 
U> U~, >/~ for ~- — -~ and |~ for —*-~- *—. Given A, we denote 
(A, -/~) by A/~. The ARS A is terminating modulo ~ if there are no infinite 
rewrite sequences with —/~ and Church—Rosser modulo ~ if =* C |~. The 
ARS A is complete modulo ~ if it is terminating modulo ~ and Church—Rosser 
modulo ~. While there is no distinction for termination modulo ~ between A 
and A/~ (~-~ =~ by transitivity), it makes a considerable difference whether 
we talk about the Church—Rosser modulo ~ property and therefore completeness 
modulo ~ of A or A/~. The following lemma is taken from |1, Lemma 4.1.12]. 
It establishes an important connection between the Church—Rosser modulo ~ 
property of an ARS A and A/~. 
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Lemma 1. Let A = (A,—) and A’ = (A,—) be ARSs and ~ an equivalence 
relation on A such that = C — C —>/~. If A! is Church-Rosser modulo ~ then 
A/~ is Church-Rosser modulo ~. 


The definitions and results for ARSs carry over to TRSs by replacing the 
equivalence relation ~ by the equational theory +>% of an ES B. Most theoret- 
ical results of this paper are not specific to AC but hold for an arbitrary base 
theory B of which we only demand that Var(¢) = Var(r) for all £~ r € B. We 
abbreviate +; by ~g and the rewrite relation >p/g is defined as ~g : >R ` ~B. 
Furthermore, we write | for the relation >} ~g - p<. Termination modulo B 
is shown by B-compatible reduction orders >, i.e., > is well-founded, closed under 
contexts and substitutions and ~g - > -~g C >. This paper deals with a comple- 
tion procedure which produces TRSs R such that R (rather than R/B) is com- 
plete modulo BL. In particular, the completion procedure uses the joinability with 
respect to |% of CP(R) UCP~(R, B=) where B= denotes BU{r ~ L| L~ r eB} 
as a sufficient and necessary criterion for the Church—Rosser modulo 6 property 
of a B-terminating TRS R. Note that this criterion works with standard critical 
pairs and therefore does not need unification modulo B. However, the criterion 
is not valid for non-left-linear TRSs as the following example shows. 


Example 1. Consider the TRS R consisting of the single rule f(x, x) — z with 
+ as an additional AC function symbol. There are no critical pairs in R and 
between R and AC, so CP(R) = CP? (R, AC~) = Ø. Now consider the conversion 
f(x +y, y +x) ~ac f(at+y,e+y) >r x+y. According to the criterion, f(a + 
Y, y +x) |Z x+y should hold, but this is clearly not the case. 


3  Avenhaus’ Inference System 


The idea of completion modulo an equational theory $ for left-linear systems 
where the normal rewrite relation can be used to decide validity problems has 
been put forward by Huet [5]. To the best of our knowledge, inference systems for 
this approach are only presented in the books by Avenhaus [1] and Bachmair [3]. 
This section presents a new correctness proof of a version of Avenhaus’ inference 
system for finite runs in the spirit of [4] which does not rely on proof orderings. 
Correctness of Bachmair’s system is established by a simulation result in Sect. 4. 


3.1 Inference System 


Definition 1. The inference system A is parameterized by a fixed B-compatible 
reduction order > on terms. It transforms pairs consisting of an ES E and a 
TRS R over the common signature F according to the following inference rules 
where s X t denotes either s ~ t orta s: 
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O ER if s ans al t orient EWS Xt} R if s SS t 
EU{sxt},R g E RU{s >t} 
deduce na ew LR 
; s Wiss tf, s 
E,RU {t— s} if s R— + £ t delete CER if s ~B t 
EW SL, R E,Rwit 
simplify z = i = a if s >R/pBU collapse oe re 3 ift>ru 
compose E eem t ift u 
? E,RU{s > u} RIB 


A step in an inference system | from an ES € and a TRS R to an ES €’ and 


a TRS R’ is denoted by (£, R) Hı (E’, R’). The parentheses of the pairs are only 
used when the expression is surrounded by text in order to increase readability. 
In the following, PCP~(R, B=) denotes the restriction of CP*~(R,B*) to prime 
critical pairs but where irreducibility is always checked with respect to R, i.e., 
the critical peaks t $— s of u and t' o2% s >h wu’ are both prime if all proper 
subterms of s|, are irreducible with respect to R. 


Definition 2. Let € be an ES. A finite sequence 
Eo, Ro Fa &1,Ri Fass Fa En, Rn 


with Eo = E and Ro = Ø is a run for E. If En #4 Ø, the run fails. The run is 
fair if Rn is left-linear and the following inclusions hold: 


n n 
PCP(Rn) € IR, UL eur, PCP*(Rn,B*) C IR, UU or, 
i=0 i=0 
Intuitively, fair and non-failing runs yield a -complete presentation Rn of 
the initial set of equations E, i.e., >$ug = R,uB © lR,- In particular, the 
inference rules are designed to preserve the equational theory augmented by B. 
The following example shows that deducing local cliffs (rR - g) as rules as 
well as the restriction to —, in the collapse rule are crucial properties of the 
inference system. 


Example 2. Consider the ES E consisting of the single equation x +0 x where 
+ is an AC function symbol. We clearly have 0+ 2% <% jac z, so an AC complete 
system C representing E has to satisfy 0+x |7 x. There is just one way to orient 
the only equation in €, which results in the rule x + 0 — zx. Since we want our 
run to be fair, we add the rules stemming from the prime critical pairs between 
x +0 — x and ACF: 


0+z—>x sz+(0+y)>r+y z+(y+0)>xz+y («+y)4+0->a+4+y 


If collapsing with >r /,c is allowed, all these rules become trivial equations and 
can therefore be deleted. Thus, the modified inference system allows for a fair 
run which is not complete as 0 + x |% x does not hold for R = {a +0 — x}. 
Furthermore, if we add pairs of terms stemming from local cliffs as equations, 
we get the same result by applications of simplify. 
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The inference system presented in Definition 1 is almost the same as the one 
presented by Avenhaus in [1]. However, since we only consider finite runs, the 
encompassment condition for the collapse rule has been removed in the spirit 
of [13]. The following example shows that this can lead to smaller 6-complete 
systems. 


Example 3. Consider the ES E = {f(x + y) ~ f(x) + f(y)} where + is an AC 
symbol. The inference system presented in [1] produces the AC complete system 


f(x +y) —> F(x) + f(y) fly +x) —> f(x) + f(y) 


in which either of the rules could be collapsed if it was allowed to collapse with 
the other rule. In [1] this is prevented by an encompassment condition which 
essentially forbids to collapse at the root position with a rewrite rule whose left- 
hand side is a variant of the left-hand side of the rule which should be collapsed. 
However, this is possible with the system presented in this paper, so for an AC 
complete representation just one of the two rules suffices. 


3.2 Confluence Criterion 


The confluence criterion used in the correctness proof of A is an extended version 
of the one used in [4] which we dub peak-and-cliff decreasingness. In the following, 
we assume that equivalence relations ~ are defined as the reflexive and transitive 
closure of a symmetric relation H, so ~ = *. Furthermore, we assume that 
steps are labeled with labels from a set I, so let A = (A, {—a}aer) be an ARS 
and ~ = (Uacra)*an equivalence relation on A. 


Definition 3. The ARS A is peak-and-cliff decreasing if there is a well-founded 
order > on I such that for all a, @ € I the inclusions 


a= >g ee at Hg C Se 
Vaß Va B 


hold. Here Vvaß denotes the set {y € I | a > yorß >y} and if J C I then >; 
denotes Ues —„. We simplify Vaa to Va. 


Lemma 2. Every conversion modulo ~ is either a valley modulo ~ or contains 
a local peak or cliff: 


e* C [MU SF e.a. S US* HO OU OOH * 


The proof of the following theorem is based on a well-founded order on mul- 
tisets. We denote the multiset extension of an order > by >mu.- It is well-known 
that the multiset extension of a well-founded order is also well-founded. 


Theorem 1. If A is peak-and-cliff decreasing then A is Church—Rosser mod- 
ulo ~. 
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Proof. With every conversion C we associate a multiset Mc consisting of labels 
of its rewrite and equivalence relation steps. Since A is peak-and-cliff decreasing, 
there is a well-founded order > on I which allows us to replace conversions C of 
the forms a - 8, a— - Hg and Hg - —>aby conversions C’ where Mo >mul 
Mo. Hence, we prove that A is Church—Rosser modulo ~, i.e., * C J~, by 
well-founded induction on > my. Consider a conversion a =* b which we call C. 
By Lemma 2 we either have a |~ b (which includes the case that C is empty) or 
one of the following cases holds: 


ae. ->-3*b as -H-H-s*b ae -H->-3*b 


If a |~ b we are immediately done. In the remaining cases, we have a local peak 
or cliff with concrete labels a and 6, so Mo = I, W {a, 8} W I>. Since A is 
peak-and-cliff decreasing, there is a conversion C” with Mc = I, WI I's where 
{a, B} >mu ©. Hence, Mo >mu Mc and we finish the proof by applying the 
induction hypothesis. 


In the following, we connect the joinability of local peaks and cliffs to the 
joinability of prime critical pairs which allows us to apply peak-and-cliff decreas- 
ingness in the correctness proof of A. 


Definition 4. Given a TRS R and terms s, t and u, we write t V, u if s >} t, 


s >} u, and t |R u ort pcpr) u. We write t V7 u ifs >$ t, s~ u and 


t |R U ort -pcp+(r,pt) u. Furthermore, YV = {(u,t) | t VY u}. 


Lemma 3. Let R be a left-linear TRS. The following two properties hold: 


1. Ift R= s >r u then t V? u. 
2. Ift rR sg u thent V, VP u. 


3.3 Correctness Proof 


We show that every fair and non-failing finite run results in a -complete presen- 
tation. To this end, we first verify that inference steps in A preserve convertibility. 
We abbreviate EUR UB to ERB and E'U R' UB to ERB’. 


Lemma 4. If (€, R) ka (E',R') then the following inclusions hold: 


= * 


C — —— C e 
R'/B ERB’ ~— ERB 


ERB ~~ R'/B ry 


U=). s 


* 1 i * S * 
Corollary 1. If (E, R) FA (E',R') then ear o 


Lemma 5. If (E, R) tA (E', R’) and RC > then R'C >. 


Definition 5. Let — be a rewrite relation or equivalence relation, M a finite 
multiset of terms and > a B-compatible reduction order. We write s —> t ifs >t 
and there exist terms s',t € M such that s' = s and t >t for Z = >U ~g. 
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We follow the convention that if a conversion is labeled with M, all single 

steps can be labeled with M. 
Lemma 6. Let (E,R) Fa (E’,R’) and R' C >. 

M M , 
1. For ony finite n M e have aa C tna 
2. fst then s —> R I. t with {s} >mu N. 

Finally, we are able to prove the correctness result for A, i.e., all finite fair 
and non-failing runs produce a B-complete TRS which represents the original 
set of equations. In contrast to [1] and [3], the proof shows that it suffices to 
consider prime critical pairs. 


Theorem 2. Let E be an ES. For every fair and non-failing run 
Eo, Ro Fa E1, Ri Fass Fa En, Rn 
for E, the TRS Rn is a B-complete representation of E. 


Proof. Let > be the B-compatible reduction order used in the run. From fairness 
we obtain En = Ø as well as the fact that Rn is left-linear. Corollary 1 establishes 
Fup = ?R, ug and termination modulo B of Re, follows from Lemma 5. It 
remains to prove that Rn is Church—Rosser modulo 6 which we do by showing 
peak-and-cliff decreasingness. So consider a labeled local peak t Re s sR u. 
Lemma 3(1) yields t V2 u. Let v V, w appear in this sequence (so v = ¢ or 
w = u). By definition, v |r, W Or v pcprr,) W. Together with fairness, 
the fact that ~g is reflexive as well as closure of rewriting under contexts and 
substitutions we obtain v |ġž, w or (v,w) € U; o ziur; In both cases, it 
is possible to label all steps between v and w with {v, w}. Since s > v and 
s > w we have Mi >mu {v,w} and Mz >mu {v,w}. Repeated applications of 
Lemma 6(1) therefore yield a conversion in Rn UB between v and w where every 
step is labeled with a multiset that is smaller than both Mı and Mə. Hence, the 
corresponding condition required by peak-and-cliff decreasingness is fulfilled. 
Next consider a labeled local cliff t pie s oe u. From Lemma 3(2) we 
obtain a term v such that t V, v V> u. As in the case for local peaks we obtain a 
conversion between ¢ and v where each step can be labeled with {t,v} <mu Mi. 
Together with fairness, v VY u yields v | u or (v,u) € U; o xr: In the 
former case there exists a k such that v >k, + ~g ` Rie u. If k = 0 we can 


label all steps with {v}. If k > 0 the conversion is of the form v >k, ` ~g 


ee '— w Rr, u. We can label the rightmost step with Mə and the remaining 


steps with {v,w}. Note that s > v. Since > is a B-compatible reduction order 
we also have s > w. Thus, Mi >mu {v,w} which establishes the corresponding 
condition required by peak-and-cliff decreasingness for all k. In the remaining 
case we have (v,u) E€ U;_9 r,, so there is some i < n such that v oR, u. 
Actually, we know that u sR v since otherwise we would have both s > v 
and v > s by the B-compatibility of >. Repeated applications of Lemma 6(1,2) 
therefore yield a conversion between u and v of the form 
M2 = N x 
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where {u} >mu N. By definition, s’ Z u for some s’ € Mı and therefore My >mu 
N, which means that the corresponding condition required by peak-and-cliff 
decreasingness is fulfilled. Overall, it follows that R,, is peak-and-cliff decreasing 
and therefore Church—Rosser modulo B. 


Note that the proofs of the previous theorem and Theorem 1 do not require 
multiset orders induced by quasi-orders but use multiset extensions of proper 
B-compatible reduction orders which are easier to work with. This could be 
achieved by defining peak-and-cliff decreasingness in such a way that well- 
founded orders suffice for the abstract setting. However, the usage of multiset 
orders based on B-compatible reduction orders as well as a notion of labeled 
rewriting which allows us to label steps with b-equivalent terms are crucial in 
order to establish peak-and-cliff decreasingness for TRSs. 


4 Bachmair’s Inference System 


As already mentioned, the inference system proposed by Avenhaus [1] is essen- 
tially the same as A. The only other inference system for 6-completion for left- 
linear TRSs is due to Bachmair [3]. We investigate a slightly modified version of 
this inference system where arbitrary local peaks are deducible and the encom- 
passment condition from the collapse rule is removed as we only consider finite 
runs and call the resulting system B. 

The main difference between A and B is that in B one may only use the 
standard rewrite relation >p for simplifying equations and composing rules. 
This allows us to deduce local cliffs as equations. The goal of this section is to 
establish correctness of B via a simulation by A. 


Definition 6. The inference system B is the same as A but with rewriting in 
compose and simplify restricted to >r and the following rule which replaces the 
two deduction rules of A: 


E,R 


ded r 
educe EUlsxhR 


if 8 R— + > RUBE t 


Definition 7. Let E be an ES. A finite sequence 
Eo, Ro FB &1,Ri Fe ++ FB En, Rn 


with Eo = E and Ro = Ø is a run for E. If En # Ø, the run fails. The run is 
fair if Rn is left-linear and the following inclusion holds: 


n 
PCP(R,) UPCP*(Rn, BY) C LR U U >g, 
i=0 
In contrast to Definition 2, the fairness condition is the same for all prime 
critical pairs since the inference rule deduce of B never produces rewrite rules. 
In the following, al denotes an application of the rule orient in an inference 
system |. In order to prove that fair and non-failing runs in B can be simulated 
in A, we start with the following technical lemma. 
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Lemma 7. If(E1, R1) Fg (E2, R2) and (E1, R1) 5 (EL, RL) then (EL, R1) HE 
(EL, RL) where (E2, R2) 5 (EL, R4). In a picture: 
E1, Rı Fg E2, R2 
Te Te 
w* w* 


EVR, FR ERa 


For the proof of the simulation result, we need a slightly different form of the 


previous lemma. Analogous to the notation for rewrite relations, the relation +} 
denotes the exhaustive application of the inference rule orient. 


Corollary 2. If (€1,Ri) Fe (€2,Re2) and (€1,Ri) FR (EL, Ri) then 
(Ei, R1) FA (E2, R4) where (E2, R2) Fe (E3, R3). 


Theorem 3. For every fair run (E,2) H$ (Ø,R) there exists a fair run 
(E, Ø) Fa (Z, R). 


Proof. Assume (Eo, Ro) FB (En, Rn) where Ro = En = Ø. By n applications of 
Corollary 2 we arrive at the following situation: 


Eo, Ro Hg E, Rı Fg e Fp En, Rn 

Te To To 

els ale ae 
Eo, Ro Eh EBRI (ER ERI Pe we Pe ER, 


The following two statements hold: 


1. For 0 < i < n, all orientable equations in €; are in R; (possibly reversed) and 
the other equations are in £7. 


2. PCP*(R/,, B=) is a set of orientable equations. 


Statement (1) is immediate from the simulation relation b and statement (2) 
follows from B-compatibility of the used reduction order together with the fact 
that every (prime) critical pair is connected by one R,-step and one B-step. 
Furthermore, En = Ø implies €/, = Ø as well as Rn = R}. Hence, we obtain 
fairness of the run in A by showing the following inclusions: 


PCP(R;,) E IR, U| Seur, — PCP*(R,,,B*) C I, UU or: 
i=0 i=0 


Let s +t € PCP(R/,). By fairness of the run in B we obtain s |%, tor s >g, t 
for some k < n. In the former case, we are immediately done. In the latter case 
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we obtain s >g; ur; t from (1) as desired. Now, let s ~ t € PCP*(R,, B*). By 
fairness of the run in B we obtain s LR, t or s œg, t for some k <S n. Again, 
we are immediately done in the former case. In the latter case we have s egr, t 
because of (1) and (2). Therefore, the run in A is fair. 


The previous theorem is an important simulation result which justifies the 
emphasis on A in this paper. Moreover, together with Theorem 2 the correctness 
of the inference system B is an easy consequence. 


Corollary 3. Every fair and non-failing run for E in B produces a B-complete 


presentation of E. 


5 AC Completion 


So far, the theoretical results have been generalized by using the equational the- 
ory B as a placeholder. In practice, however, this paper is concerned with the 
particular theory AC. The results of this section allow us to assess the effective- 
ness of the inference system A in the setting of AC completion. 


5.1 Limitations of Left-Linear AC Completion 


In addition to the restriction to left-linear rewrite rules, the following exam- 
ple demonstrates another severe limitation of the inference system A previously 
unmentioned in the literature. 


Example 4. Consider the ES € consisting of the equations 
and(0,0) ~ 0 and(1,1) ~ 1 and(0,1) ~ 0 


where and is an AC function symbol. There is only one way to orient each 
equation. Furthermore, there are no critical pairs between the resulting rewrite 
rules. Hence, using the inference system A we arrive at the intermediate TRS 


and(0,0) > 0 and(1,1) - 1 and(0,1) > 0 


where the only possible next step is to deduce local cliffs. We will now show that 
this has to be done infinitely many times. Note that an AC-complete presentation 
R of E has to be able to rewrite any AC-equivalent term of a redex: Consider 
the infinite family of terms 


So = and(0, 1) sı = and(and(0, x1), 1) s2 = and(and(and(0, z1), £2), 1) --- 
as well as 
to =0 ti = and(0, £1) t2 = and(and(0, z1), x2) 


Clearly, sn $yac tn for all n € N and therefore also sn |} tn for all n € N, but 
this demands infinitely many rules in R: For each sn there is an AC-equivalent 
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term such that the constants 0 and 1 are next to each other which allows us 
to rewrite it using the rule and(0,1) — 0. However, with n also the amount of 
variables between these constants increases which requires R to have infinitely 
many rules since rewrite rules can only be applied before the representation 
modulo AC is changed. 


Note that there is nothing special about this example except the fact that it 
contains at least one equation which can only be oriented such that the left-hand 
side contains an AC function symbol where both arguments have “structure”, 
i.e., both arguments represent more complicated terms than a variable. As a 
consequence, the necessity of infinite rules applies to all equational systems which 
have this property. Needless to say, this means that for a large class of equational 
systems the corresponding AC-canonical presentation (in the left-linear sense) 
is infinite if it exists. This observation is in stark contrast to the properties of 
general AC completion as presented in the next section which can complete the 
ES € from Example 4 into a finite AC-canonical TRS by simply orienting all 
rules from left to right. 


5.2 General AC Completion 


Inference systems for completion modulo an equational theory which are not 
restricted to the left-linear case usually need more inference rules than the ones 
already covered in this paper. For general AC completion, however, there exists 
a particularly simple inference system which constitutes a special case of nor- 
malized completion [12] and can be found in Sarah Winkler’s PhD thesis [16, 
p. 109]. 


Definition 8. The inference system KBac is the same as A for the fixed theory 
AC but with a modified collapse rule which allows us to rewrite with >R ac and 
the following rule which replaces the two deduction rules of A: 


ER 


deduce — A __ 
eUe TUR CET a 


ifs R=: ~ac Ret 


The purpose of this section is to show how A can be simulated by KBac in 
the case of B = AC. Since local cliffs cannot be deduced in KBac, the simulation 
has to work with a potentially smaller set of rewrite rules. Furthermore, during 
a run, the variants of rules stemming from local cliffs may be in different states 
with respect to inter-reduction (collapse and compose). Given an intermediate 
TRS R of a run in A as well as an intermediate TRS R’ of a run in KBac, the 
invariant R C >}, Jac resolves both of the aforementioned problems. The main 
motivation behind this invariant is the avoidance of compose and collapse in the 
KBac run. 

Lemma 8. If (E1, Ri) Fa (E2, R2) and Ri C =k; /AC then there exists a TRS 


Ry such that (E1, R1) kp (E2, R2) and Ra C RY JAC 
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Proof. Let > be a fixed AC-compatible reduction order which is used in both A 
and KBac. Suppose (E1, R1) Fa (E2, R2) and Ri C =R Jac We proceed by a 
case analysis on the rule applied in the inference step (E1, R1) Fa (E2, R2). The 
only interesting cases are when deduce, simplify, compose, or collapse is applied. 


— If deduce is applied, we further distinguish whether it was applied to a local 
peak or cliff. In the case of a local cliff, we have €; = E and Ro = R,U{l > r} 
with €>R, /ac r. From £ SR, sac r and Ri C RAC we obtain £ RI jac Ts 


Thus, Rə C Rr Jac holds. As (£1, R1) Hke, (E2, R4) is trivial, the claim 
follows. In the case of a local peak, we have Ri = Rz and Ez = E1 U {t = u} 
with t rR, S >R, u. Since Ry C =k; /AC holds, we have t RIJA 
© NAC S MAC © SOR, W RI JAC u for some v and w. By performing deduce and 


=U R 
simplify steps 
(E1, RI) FKBac (E U {v x w}, RI) HFkBac (Ey U {t od u}, Ri) E (E2, R1) 


is obtained. As Rı = Re, the inclusion Rz C =R; /AC is trivial. Hence, the 
claim holds. 

— If simplify is applied, we have Ry = Ro, €& = Eo U {s ~ t}, and E = 

Eo U{s' = t'} with s >R, s’ and t >R, t. By Ri C =R; Jac we have 

S >R, JAC s’ and t >R; Jac t’. Therefore, performing simplify, we obtain 

(E1, R1) kea (E2, R1). As Ri = Ro, the inclusion Rə C >R jac ÍS trivial. 


— If compose is applied, we can write €; = E2, Ri = Ro U {£ —> r}, and 
Rə = Ro U {L > r'} with r >rayac 7. We have (£1, R1) kea (E2, R1). 

; ; ; + ; au E , 
Since the inclusions Ro C Ry C Ri JAC yield £ >Rijac T Ri jac T We 
obtain Rə C —> 


+ 

RI JAC: 

— If collapse is applied, we can write E2 = E1 U {V ~ r} and Ri = RoW {L> r} 
with l >r, l. By Ro C Ri © Rr Jac we have 


7 * * 
£ RI /AC t RiT “AC £ ~AC PRI URI ac T 
for some t and u. Performing deduce and simplify, we obtain: 


(E1, R1) Ekes: (E1 U {t ~ u}, R1) Few (E1 U LE © r}, Ri) = (E2, R1) 


By Ro CRıC Rr Jac the claim is concluded. 


Theorem 4. For every fair run (E, Ø) FÀ (Ø, R) there exists a run (E, Ø) Fesc 
(©, R') such that R'/AC is an AC-complete presentation of E. 


Proof. With a straightforward induction argument, we obtain the run 
(E£, Ø) Fea (Z, R’) as well as R C Rr /AC (x) from Lemma 8. Furthermore, 
AC termination of R’ and -f ac = R/yac (**) are easy consequences from 
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the definition of KBac. AC-completeness of R follows from fairness of the run 
in A and Theorem 2. For the Church—Rosser modulo AC property of R’/AC, 
consider a conversion s jac t From (**) we obtain s =% ,,c and therefore 
s >h ` ~ac ` p1 t by the fact that R is an AC-complete presentation of E. 


Finally, (*) yields s +R, )9¢ © ~ac ` RIJA t as desired. Thus, R’/AC is an 


AC-complete presentation of E. 


In addition to the result of the previous theorem, the proof of Lemma 8 
provides a procedure to construct a KBac run which “corresponds” to a given 
A run. In particular, this means that it is possible to switch from A to KBac at 
any point while performing AC completion. This is of practical relevance: Assume 
that AC completion is started with A in order to avoid AC unification. If A gets 
stuck due to simplified equations which are not orientable into a left-linear rule or 
it seems to be the case that the procedure diverges due to the problem described 
in Example 4, starting from scratch with KBac is not necessary. We conclude 
the section by illustrating the practical relevance of the simulation result with 
an example. 


Example 5. Consider the ES € for abelian groups consisting of the equations 
eure u:uRre 


where - is an AC symbol. Note that the well-known completion run for non- 
abelian group theory is also a run in A: Critical pairs with respect to the asso- 
ciativity axiom are deducible via local cliffs, non-left-linear intermediate rules are 
allowed and all (intermediate) rules are orientable with e.g. AC-KBO. Hence, we 
obtain the TRS R’ consisting of the rules 


1 e-r—- 2 6: Ge gr 
2 Gene T3 ree oe 
3: Le 8: e ce 
4: @ -(@-y)y 9: a(x -y)oy 
5 (ey) aya 


and switch to KBac where we can collapse the redundant rules 4, 6, 7 and 9. A 
final joinability check of all AC critical pairs reveals that the resulting TRS R 
is an AC-complete presentation of abelian groups. Hence, the simulation result 
allows to make progress with A even when it is doomed to fail. In particular, 
critical pairs between rules whose left-hand sides do not contain AC symbols do 
not need to be recomputed. 


6 Implementation 


To the best of our knowledge, our tool accompll is the first implementation of 
left-linear AC completion. It is written in Haskell and available on its web- 
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site’. Instead of expecting explicit AC-compatible reduction orders as input, 
accompll performs completion with termination tools [15]. In principle, com- 
pletion with termination tools has to consider all combinations of possible ori- 
entations of equations in order to find a complete system. However, travers- 
ing the whole search space is rather inefficient. The state of the art for solv- 
ing this problem efficiently is multi-completion with termination tools due to 
Winkler et al. [20]. Since the implementation of this method is a major effort, 
accompll adopts a simple but incomplete strategy presented in [14]: Instead of 
traversing the whole search space, accompll runs two threads in parallel where 
one thread prefers to orient equations from left to right and vice versa. If one of 
the threads finishes successfully, the corresponding result is reported. Comple- 
tion fails if both threads fail. 

As input, the tool expects a file in the WST? format describing the equational 
theory on which left-linear AC completion should be performed. The user can 
choose whether >r or —/ac is used for rewriting in the inference rules sim- 
plify and compose. Furthermore, the generation of critical pairs can be restricted 
to the primality criterion. 

Another feature is the validity problem solving mode which solves a given 
instance of the validity problem for an equational theory € upon successful com- 
pletion of €. This mode can be triggered by supplying a concrete equation s ~ t 
as a command line argument in addition to the file describing €. 

In the tool accompll, external termination tools do much of the heavy lifting. 
In particular, the user can supply the executable of an arbitrary termination tool 
as long as the output starts with YES, MAYBE, NO or TIMEOUT (all other cases are 
treated as an error). The input format for the termination tool can be set by a 
command line argument. The available options are the WST format as well as 
the XML format of the Nagoya Termination Tool [21]*. 

Since starting a new process for every call of the termination tool causes a 
lot of operating system overhead, the tool supports an interactive mode which 
allows it to communicate with a single process of the termination tool in a 
dialogue style. Here, the only constraint for the termination tool is that it accepts 
a sequence of termination problems separated by the keyword (RUN). This is 
currently only implemented in an experimental version of Tyrolean Termination 
Tool 2 (T7T2) [11], but we hope that more termination tools will follow as this 
approach has a positive effect on the runtime of completion with termination 
tools while demanding comparatively little implementation effort. 


7 Experimental Results 


The problem set used for the experimental results consists of 50 ESs. It is based 
on the one used in [18] and has been extended by further examples from the 
literature as well as handcrafted examples. The experiments were performed on 


1 https: //github.com/niedjoh/accompll. 
? https: //www.lri.fr/~marche/tpdb/format.html. 
3 https: //www.trs.cm.is.nagoya-u.ac.jp/NaTT /natt-xml.html. 
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Table 1. Experimental results on 50 problems (excerpt) 


accompll (TīT2) accompll (T7T9e) MædMax mkbTT 
(1) (2) (1) (2) a) 2) @ (2) 


N (4, x) 0.85 10 0.28 10 18.78 5 œ 

N (+, -, x, +) 1.74 15 0.42 15 oo 60.06 ?* 
[1, Ex. 4.2.15(b)] 0.48 4 0.24 4 0.01 3 0.19 
abelian groups al JL. 0.16 5 0.14 

[7, Ex. 2] oo oo 0.04 5 0.44 3 
problems solved 16 16 22 35 


a mkbTT does not output the completed system for unknown reasons. 


an Intel Core i7-7500U running at a clock rate of 2.7 GHz with 15.5 GiB of main 
memory. Our tool accompll was used with the termination tool T7T9 as well as an 
experimental version (denoted by Tripe) which allows our tool to communicate 
a sequence of termination problems without having to start a new process all 
the time, as described in the preceding section. 

Table 1 shows some interesting results and compares the two configura- 
tions of accompll with the normalized completion [12] mode of mkbTT [19] 
and the AC completion mode of MædMax [17]. The tool mkbTT is the origi- 
nal implementation of multi-completion with termination tools [20]. MaedMax, 
on the other hand, implements maximal completion |9] which makes use of 
MaxSAT/MaxSMT solvers instead of termination tools in order to avoid using 
concrete reduction orders as input. To the best of our knowledge, there is no com- 
parable completion tool which supports AC axioms. Since normalized completion 
subsumes general AC completion, a comparison with the aforementioned modes 
of both systems allows us to assess the effectiveness of accompll with respect to 
the state of the art in AC completion. Note that normalized completion uses AC 
unification. 

In Table 1, columns (1) show the execution time in seconds where oo denotes 
that the timeout of 60s has been reached and L denotes failure of completion. 
Columns (2) state the number of rules of the completed TRS. The first two 
problems show that the avoidance of AC unification can indeed have a positive 
effect on the execution time. However, the third problem indicates that there may 
also be an opposite effect on small problems. The last two problems show the two 
main limitations of left-linear AC completion: Abelian groups do not have an 
AC-complete presentation which is left-linear and Example 2 from [7] is a ground 
ES which causes left-linear AC completion to suffer from the problem described 
in Example 4 by definition. The severity of these limitations is reflected in the 
total number of solved problems. In particular, the problem set does not contain 
an ES which is completed only by accompll. However, given Theorem 4, this 
is not unexpected. Another noteworthy but unsurprising fact is that complete 
systems produced by accompll tend to have more rules since every rule needs 
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different versions of left-hand sides to facilitate rewriting without AC-matching. 
It would also be interesting to compare the execution times for typical queries 
of the form € E s + t as the resulting systems of left-linear AC completion allow 
for more efficient joinability checks using +R instead of + ,c. We leave this 
for future work. 

The complete results are available on the tool’s website’. We conclude with 
some additional notes on the results. 


— The results are not cluttered with detailed results for the available options 
regarding prime critical pairs and the concrete rewrite relation used for sim- 
plify and compose since they did not lead to significant runtime differences. 
Instead, the default options (no prime critical pairs and the rewrite relation 
—R) were used for the experiments. 

— The second problem in Table 1 shows the merits of using termination tools 
as it includes round-up division which cannot be handled by simplification 
orders. 

— Due to the incompleteness of the used approach for completion with termi- 
nation tools, some equations in the problems A95_ex4_2_4a.trs as well as 
sp.trs had to be reversed in order to get appropriate results. Note that this 
does not distort the experimental results for left-linear AC completion in gen- 
eral as the problem lies in the particular implementation of completion with 
termination tools. 


8 Conclusion 


In this paper, we consolidated the existing literature for left-linear AC comple- 
tion in the case of finite runs and gave new insight into its merits compared to 
general AC completion. Furthermore, our implementation accompll allowed us 
to run practical experiments. An extended version of this paper with full proof 
details and an appendix which describes the original inference systems of Aven- 
haus and Bachmair is available on the website of accompll (see Footnote 4). We 
conclude by giving some pointers for future work. First of all, the merits of our 
novel simulation result for general AC completion could be evaluated experimen- 
tally by providing an implementation. Another interesting research direction is 
normalized completion for the left-linear case. If successful, this would facilitate 
the treatment of important cases such as abelian groups despite the restriction 
to left-linear TRSs. Furthermore, a formalization of the established theoretical 
results is desirable. To that end, the existing Isabelle/HOL formalization from 
[4] is a perfect starting point as some results of this paper are extensions of the 
results for standard rewriting presented there. 
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Abstract. We study the P-interpolation property for certain local the- 
ory extensions, and use these results for proving <-interpolation in classes 
of semilattices with monotone operators. For computing the <-interpo- 
lating terms, we use a hierarchic approach. We use these results for the 
study of C-interpolation in the description logics E£ and ELT. 


1 Introduction 


In this paper we study the problem of P-interpolation, a problem strongly related 
to interpolation w.r.t. logical theories. The problem can be formulated as follows: 
Let T be a theory, A and B be conjunctions of ground literals in the signature 
of T, possibly with additional constants, P a predicate symbol in the signature 
of T, a a constant occurring in A and b a constant occurring in B. Assume that 
A^ B Hry aPb. Can we find a ground term t containing only constants and 
function symbols “shared” by A and B, such that A A B Ez aPt A tPb? 
Interpolation has been studied in classical and non-classical logics and in 
extensions and combinations of theories; and is very important in program veri- 
fication and also in the area of description logics. The first algorithms for inter- 
polant generation in program verification required explicit constructions and 
“separations” of proofs [14,16]. In [13] interpolants are computed using variants 
of resolution. For certain theories, the “separation” of proofs relied on the pos- 
sibility of “separating” atoms, i.e. on P-interpolation. Equality interpolation is 
used in [34] for devising an interpolation method in combinations of theories 
with disjoint signatures. In [22,24] and [19], for instance, we consider interpo- 
lation problems in certain classes of extensions 7g UK of a base theory 7p and 
use a hierarchical approach to compute interpolants. The method relies on the 
P-interpolation property of the base theory Jo. In most of the applications we 
considered, P is the equality predicate ~ or a predicate < with the property 
that in all models of Jo, the interpretation of < is a partial ordering. Since at 
that time our main interest was the study of interpolation problems, in [22,24] 
and [19] P-interpolation is only used in order to help in giving methods for 
interpolation and not as a goal in itself. However, in several papers in the area 
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of description logics (cf. e.g. [8,31]) when defining the notion of interpolation 
in description logics the authors define in fact a notion of L-interpolation. In 
[8] (Thm. 4) it is proved that €£* allows interpolation (in fact, the notion of 
C-interpolation mentioned above) for safe role inclusions — this is related to the 
notion of “sharing” considered in [24], cf. also Sect. 4. The proof technique in [8] 
uses simulations. In this paper, we analyze the property of P-interpolation in 
theory extensions, propose a method for solving it based on hierarchical reason- 
ing and satisfiability modulo theories, and formulate the C-interpolation problem 
for EL and EL* as a <-interpolation problem in a theory of semilattices with 
operators. We first studied <-interpolation in [17] in the context of description 
logics; the C-interpolating concept descriptions were regarded as a form of “high- 
level” explanations. In this paper we further extend the work in [17]. The general 
approach we propose opens the possibility of applying similar methods to more 
general classes of non-classical logics (including e.g. substructural logics or the 
logics with monotone operators studied in [27,28]) or in verification (to consider 
more general theory extensions than those with uninterpreted function symbols 
analyzed in [19]). The main results can be summarized as follows: 


We propose variants of the definitions of convexity, P-interpolation and Beth 
definability relative to a subsignature. 

We describe a hierarchical P-interpolation method in certain classes of local 
theory extensions. 

We illustrate the applicability of these results to prove that certain classes 
of semilattices with monotone operators have the property of <-interpolation 
for a certain interpretation of “shared” function symbols. 

We show, by giving a counterexample, that <-interpolation does not hold if 
by “shared” symbols we mean just the common symbols. 

We indicate how these results can be used to prove or disprove various notions 
of interpolation for the description logics E£ and EL*. 


Structure of the Paper: In Sect. 2 and 3 basic notions are introduced, and some 
results needed later are proved. In Sect. 4 we identify classes of local theory exten- 
sions allowing P-interpolation and propose a hierarchical method of computing 
P-interpolants. This is used in Sect. 5 to study the existence of <-interpolation 
in classes of semilattices with monotone operators. In Sect.6 we use the links 
between the theory of semilattices with operators and the description logics EL 
and EL", and show how the results can be used in the study of these logics. The 
details of the proofs and additional examples can be found in [18]. 


2 Theories, Convexity, P-Interpolation, Beth Definability 


We assume known standard definitions from first-order logic such as H- 
structures, models, homomorphisms, logical entailment, satisfiability, unsatis- 
fiability. 

We consider signatures of the form IT = (X, Pred), where X is a family of 
function symbols and Pred a family of predicate symbols. In this paper, a theory 
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T is described by a set of closed formulae (the axioms of the theory). We call a 
theory axiomatized by a set of (universally quantified) equations an equational 
theory. In this paper, we denote by Mod(T) the set of all models of T. We denote 
“falsum” with L. If F and G are formulae we write F |} G (resp. F Hy G) to 
express the fact that every model of F (resp. every model of F which is also a 
model of T) is a model of G. The definitions can be extended in a natural way 
to the case when F is a set of formulae; in this case, F =y G if and only if 
TUF EG. F |l means that F is unsatisfiable; F zL means that there is 
no model of 7 which is also a model of F. If there is a model of 7 which is also 
a model of F we say that F' is T-consistent. If C is a fixed countable set of fresh 
constants, we denote by J? the extension of I with constants in C. 


Convexity and P-Convexity. We can define a notion of convexity w.r.t. a 
subset P of the set of predicates. 


Definition 1. A theory T with signature IT = (X, Pred) is convex with respect 
to a subset P of Pred (which may include also equality ~) if for all conjunc- 
tions T of ground HC -atoms (with additional constants in a set C), relations 


Rı,..., Rm € P and tuples of HC -terms of corresponding arity t,,...,tm such 
that T Hr Vi", Rifti) there exists io € {1,...,m} such that T Er Rig (tig). 


We will call a theory T conver if it is Pred U {}-convex. The following result is 
well-known (cf. e.g. [5, 10,32]): 


Theorem 1. Let T be a theory and let Mod(T) be the class of models of T. 


(i) If Mod(T) is closed under direct products then T is convex. 
(ii) If T is a universal theory and T is convex, then T has an axiomatization 
given by Horn clauses, hence Mod(T) is closed under direct products. 


Corollary 2. Let Ti, Tz be two theories with signatures I, Ho. If Mod(T,) and 
Mod(T2) are closed under direct products, then Ti U To is convex. 


Proof: Follows from the fact that if Mod(7;) and Mod(73) are closed under direct 
products then so is also Mod(Z, U 72) and from Theorem 1. 


From Theorem 1 and Corollary 2 it immediately follows that if 7; and 72 are 
universal theories and convex then J, U7} is convex. In particular, every extension 
of a convex universal theory Jp with a set of new function symbols axiomatized 
by a set K of Horn clauses is convex. 


Equality Interpolation, R-Interpolation. We say that a convex theory T 
has the equality interpolation property if for every conjunction of ground MO- 
literals A(c, a1, a) and B(é,bi,b), if AA B z a = b then there exists a term t(@) 
containing only the shared constants ¢ such that AA B z a œ~ t(e) A t(©) ~ b. 

Sometimes, the theories and theory extensions we study contain interpreted 
symbols in a set Io = (Xo, Pred) and non-interpreted function symbols in a set 
+. The classical definition for equality interpolation for a theory 7 mentioned 
above allows the term ¢(é) to contain all function symbols in the signature of T 
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— these symbols are in this case all seen as being interpreted. If we distinguish 
between interpreted and uninterpreted functions we might require that the inter- 
mediate term ¢(¢) contains only “shared” uninterpreted functions and common 
constants. 

If X4 and Xg are the uninterpreted function symbols occurring in A resp. B, 
and O is a closure operator, by “shared” uninterpreted functions we can mean: 


— Intersection-shared symbols: (\-Shared(A, B) = X4 N Xp, or 
— O-shared symbols: O-Shared(A, B) = O(X74) N O(XB). 


Example 1. Let T = To UK be the extension of a theory 7p with set of inter- 
preted function symbols with a set K of clauses containing new uninter- 
preted function symbols in a set X. If A and B are sets of atoms in the 
signature of 7 containing additional constants in a set C and uninterpreted 
function symbols X4, Xg then the intersection-shared uninterpreted function 
symbols of A and B are X4 N Xp. Let Ox be defined for every X C Xı by 
Ok(X) = Uyestg € 21 | 9 ~k f}, where ~% is the equivalence relation induced 
by the relation f ~x g iff there exists CEK s.t. f,g both occur in C. 

Then the Oj-shared symbols are Ox (2'4) Ox (5’g). In particular, if A contains 
a function symbol f and B contains a symbol g such that f,g occur both in a 
clause in K, then f and g are considered to be Ox-shared by A, B. 


We also might be interested in similar properties for other binary relations. We 
define an R-interpolation property, where R is a binary predicate symbol in HM. 


Definition 2. Let R € PredU{} be a binary predicate symbol. An {R}-convex 
theory T with uninterpreted symbols 7, has the R-interpolation property if for 
all conjunctions of ground atoms A(é,@, a) and B(é,b;,b), if ANB z aRb then 
there exists a term t(€) containing only common constants t and only “shared” 
uninterpreted symbols in Xı such that AA B Er aRt(c) A t() Rb. 


If P C Pred, we say that a theory has the P-interpolation property if it has the R- 
interpolation property for every R € P. In Sect. 5 we give examples of theories 
with this property and show that a theory may not have the R-interpolation 
property for a predicate symbol R if we use the notion of intersection-shared 
symbols, but has the R-interpolation property if we consider the less restrictive 
notion of O-shared symbols for a suitably defined closure operator O. 


Beth Definability. Let T be a theory with signature IT = (XoUX:, Pred), where 
the function symbols in Xo are interpreted function symbols and the function 
symbols in X4 are regarded as uninterpreted function symbols, and let C be a 
set of additional constants. We define a notion of Beth definability relative to a 
subset Xs C Xı UC of non-interpreted function symbols and constants similar 
to the one introduced in [31], which we refer to as »’s-Beth definability. 

Let Xs C 2 UC, let X, = Yy\Ns, and let HM’! = (Xo U (Xs N X1) U X}, Pred), 
where X’ = {f | f € 3\2’g} is the signature obtained by replacing all uninter- 
preted function symbols in 4, which are not in Xs with new primed copies. If ¢ 
is a IJ°-formula, we will denote by ¢! the formula obtained from ¢ by replacing 
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all uninterpreted function symbols in X1\ Xs and all constants in C\ Xs with dis- 
tinct, primed versions. The interpreted function symbols and the uninterpreted 
function symbols and constants in X's are not changed. We regard the theory T 
as a set of formulae; let T’ := {¢’ | pE T}! 

Let A be a conjunction of ground JT©-literals, and a € C. We say that a is 
implicitly defined by A w.r.t. Xs and T if, with the notations introduced before, 


ANA’ ETuT! ae a’. 


We say that a is explicitly defined by A w.r.t. Xs and T if there exists a term t 
containing only symbols in Xo, Pred and Xs such that A Hz at. 


Definition 3. Let T be a theory with uninterpreted function symbols in a set 
3. Let Xs C X1 UC. T has the Beth definability property w.r.t. Xs (Xs- 
Beth definability), if for every conjunction of literals A and every a E€ C, if A 
implicitly defines a w.r.t. ig and T then A explicitly defines a w.r.t. Xs and T. 


In [4,6] it was proved that if a convex theory has the ~-interpolation property, 
then it has the Beth definability property. We give an analogous implication 
between ~-interpolation and Beth definability w.r.t. a subsignature. 


Theorem 3. Let T be a convex theory with signature IT = (Xo U X1, Pred), C 
a set of constants, and Xs C X1 UC. Let T’ be as defined above. 


(i) IfT UT’ has the =-interpolation property with intersection-sharing, then T 
has the X'g-Beth definability property. 

(ii) Assume that T = ToUK where all symbols in the signature of To are regarded 
as interpreted, and K is a set of clauses also containing uninterpreted func- 
tion symbols in X1. Let Ox be the closure operator defined in Example 1. If 
TUT” has the =-interpolation property with Oxux:-sharing, then T has the 
Ox (X’s)-Beth definability property. 


Proof (Idea): (i) Assume a is implicitly definable w.r.t. Xs, i.e. there exists a 
conjunction A of literals such that if A’ is obtained by renaming as explained 
before, then AA A’ rur a © a’. Since T AT’ has ~-interpolation, there exists 
a term ¢ using only the functions and predicate symbols common to A and A’ 
(i.e. the symbols in Xo U Xs) such that AA A’ Epur ae tAt xa’. It can be 
shown that then A =r a” t. 

(ii) Assume a is implicitly definable w.r.t. Ox, (2's), i.e. there exists a con- 
junction A of literals such that if A’ is obtained by renaming as explained before 
then AA A’ rur a & a’. The symbols shared by A and A’ are the symbols in 
SoULsUOxKuK (Xs), where OxuK: (Xs) = Usessns, {g € PILOPIA | f ~ uK g}. 
It is easy to see that for every f € 34\L's, f € Ok(Xs) if f € Okx(Xs), 
and Okux (Xs) = Ok(Xs) U Ox (Xs). Since we assumed that 7 U 7” has the 
zx-interpolation property with the notion of Oxux/-sharing, there exists a term t 
over the signature Xo U OxuK: (Xs) such that AA A! rurat A taa. The 
term t might contain primed versions of function symbols. We can show that we 
can find a term ¢ containing only terms in Ox(Xs) such that A =z a xt. 


1 A similar definition can be given if theories are defined as classes of models. 
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3 Local Theory Extensions 


Let Ho=(X0, Pred) be a signature, and Jo be a “base” theory with signature Ip. 
We consider extensions T := To UK of Jo with new function symbols X; (ezten- 
sion functions) whose properties are axiomatized using a set K of (universally 
closed) clauses in the extended signature IT = (Xo U 24, Pred), which contain 
function symbols in X4. If G is a finite set of ground JT°-clauses, where C is an 
additional set of constants, and K a set of J-clauses, we will denote by st(K, G) 
(resp. est(K,G)) the set of all ground terms (resp. extension ground terms, i.e. 
terms starting with a function in X1) which occur in G or K. In this paper we 
regard every finite set G of ground clauses as the ground formula Ageg C. If T 
is a set of ground terms in the signature T°, we denote by K[T] the set of all 
instances of K in which the terms starting with a function symbol in X are in 
T. Let YW be a map associating with every finite set T of ground terms a finite 
set W(T) of ground terms containing T. For any set G of ground IT°-clauses we 
write K[We(G)] for K[W(est(K,G))]. We define: 
(Loc ) For every finite set G of ground clauses in JT© it holds that 
To URUGE L if and only if 7o UK[We(G)] UG is unsatisfiable. 


Extensions satisfying condition (Loc?) are called W-local. If W is the identity we 
obtain the notion of local theory extensions [21]; if in addition To is the theory 
of pure equality we obtain the notion of local theories [9,15]. 


Hierarchical Reasoning. Consider a W-local theory extension 7g C To UK. 
Condition (Loc¥) requires that for every finite set G of ground IT©-clauses, 
ToUKUG HL iff ToUK [Wc (G)JUG HL. In all clauses in K[W%e(G)]UG the function 
symbols in 7, only have ground terms as arguments, so K[W%(G)|UG can be 
flattened and purified by introducing, in a bottom-up manner, new constants 
cı € C for subterms t=f(c1,...,Cn) where fE€2, and c; are constants, together 
with definitions c-=f(c1,...,Cn). We thus obtain a set of clauses Kp) UGpUDef, 
where Ko and Go do not contain »-function symbols and Def contains clauses 
of the form c=f(ci,...,¢n), where FEX, c,c1,...,Cn are constants. 


Theorem 4 ({11,12,21]). Let K be a set of clauses. Assume that To C To UK is 
a W-local theory extension. For any finite set G of flat ground clauses (with no 
nestings of extension functions), let Koa UGpUDef be obtained from K[Wc(G)]UG 
by flattening and purification, as explained above. Then the following are equiv- 
alent to To U KUG EL: 


(i) ToK [Pk (CUG EL . 
(ii) To U Ko U Go U Cong EL, where 


= 7 wd. a (c1, --, Cn) ~ c E Def 
Como = [Aad ema | Pdi,- -,dn) © d € Def f” 


In [12] we showed that for extensions with sets of flat and linear clauses W-locality 
can be checked by checking whether an embeddability condition of partial into 
total models holds.In [26] we mention (without proof) that the proof in [12] can 
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be extended to situations in which the clauses in K are not linear. The result is 
presented below. A full proof is given in the extended version of this paper [18]. 


Theorem 5. Let K be a set of Xı-flat clauses, and Yc be a term closure operator 
such that for every set T of ground terms and for every clause D in K, if a 
variable occurs in two terms in D then either the two terms are identical, or 
the variable occurs below two different unary function symbols f and g and, 
for every constant c, f(c) is in Y(T) iff g(c) is in Y(T). If all partial models 
A of Ig UK with totally defined Xo-functions, and for which the set of terms 
{f(a1,..-,@n) | f E 34 and fa(ai,...,Gn) is defined} is finite and closed under 
W, embed into total models of Tg UK, then the extension Tg UK satisfies (Loc¥). 


4 R-interpolation in Local Theory Extensions 


In [24] we considered convex and P-interpolating theories Jj with signature 
Io = (Xo, Pred) (where PCPred). We studied W-local extensions T = Tọ UK 
of Tọ with new function symbols in a set 37; axiomatized by a set K of clauses, 
with the property that all clauses in K are of the form: 


nee — f(x1,..-,¢n) Ra(yi,---, Yn) (1) 
ry Ri yi A+++ AEn Rn Yn + J iis ey Mig) RI Oijear a Ve) 

where n > 1, £1,..., Zn, Y1,- -Yn are variables, f,g € X41, Ri,..., Rn, R 
are binary relations with R,,...,R, E€ P and R transitive, and each s; 

is either a variable among the arguments of g, or a term of the form 
fila,---,2%), where fi € X and all the arguments of f; are variables 


occurring among the arguments of g. 


Example 2. A set K of axioms containing clauses of the form: 


T < h(yi) > f(z1) < g(y1) 
zı < yı > f(a) < f(y) 


satisfies the conditions above: n = 1, Ry = R =<, sı = h(y1), f,g,h € X. 


In [24], we proved that if Jo allows ground interpolation, then 7 allows ground 
interpolation, and that the interpolants can be computed in a hierarchical way, 
using a method for ground interpolation in Jo. We now show that under the 
conditions above, the property of P-interpolation can be transferred from the 
theory Jo to the extension 7 = Jo UK of Tọ. The function symbols in the 
signature of Jo are considered to be interpreted, and will always be considered 
to be shared. For the function symbols in the signature X4 — considered to be 
“quasi’-interpreted — we use the notion of Ox-sharing introduced in Sect. 2. 

In order to show that 7 has the P-interpolation property, we need to prove 
that if A, B are conjunctions of atoms and A(T, a, a) B(é, b1, b) Ez aRb, where 
R € P, then there exists a term t containing only the constants common to A 
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and B and only function symbols which are Ox-shared by A and B, such that 
A(é, āū, a) A BE, bi, b) Hr aRt A tRb. 

A(t, a1, a)AB(Z,b1, 6) aRb iff A(?,a,a)AB(%,b1,b)A7(aRb) Hr L. 
By Theorem 4 we can purify and flatten this conjunction and obtain a con- 
junction of unit clauses Ap \ Bo A Def \ —(a Rb), where Def is a set of definitions 
of newly introduced constants. Let T be the extension terms in Def. We intro- 
duce new constants and definitions also for all extension terms in ¥(T). This 
new set of definitions can be written as a conjunction D4 A Dg of its A-part and 
its B-part. By the W-locality of the extension Jo C Jo UK and Theorem 4, 


Ao A Bo A Def A ~(a Rb) =r L iff Ko A Ao A Bo ^ Con[Da A Dpglo A —(aRb) Eq, L, 


where Ko is obtained from K[D4 ^A Dp] by replacing the Xı-terms with the 
corresponding constants contained in the definitions D4 A Dg and 
f(ci,.--5€n) mtn 
Con[Da A Dalo = A { Àa =d sema a eda 
In general, Con[ DA A Dal = = Cong A Con? A Conmix and Ko = KA A KB A Kmix 
where Cong ,Ké only contain extension functions and constants which occur in 
A, Cong ,K&® only contain extension functions and constants which occur in B, 
and Conmix, Kmix contain mixed clauses with constants occurring in both A and 
B. Our goal is to separate Con, jx and Kp, into an A-part and a B-part, which 
would allow us to use the P-interpolation property of theory Tọ. 


Proposition 6. Assume that To is convex and P-interpolating. Let H be a set 
of Horn clauses (N; c:Ridi) > cRod in the signature IIẸ (with Ro transitive 
and R; € P) which are instances of flattened and purified clauses of type (1) and 
of congruence axioms. Let Hmix be the mixed clauses in H: 
Hmix = {A;a ccRidi —> cRod € H | ci,c constants in A,di,d constants in B}U 
{Aju iRidi  cRod € H | ci, c constants in B,d;,d constants in A} 

Let Ao and Bo be conjunctions of ground literals in the signature TIẸ such that 
Ao \ Bo AH A7(akb) Ea,L. Then H can be separated into an A- and a B-part 
by replacing the set Hmix of mixed clauses with a separated set of formulae Hsep: 


(i) There exists a set T of (Xo UC)-terms containing only constants common 
to Ao and Bo such that Ap A Bo A (H\Hmix) A Hsep A ~la Rb) Hn, where 
Hsep={(Aj_ 1 ciRiti > cRe g(t, ee tn)) A (Ae bi Ridi > Ch(ti,..., 

Noa Rid; > CRA E Hix; cs (€1,---,€n), dg(e1,.--, "en)€Dp, 
ei. ..+,€n)€Dza or vice versa } = HA, ATs 
and Cf(t,,...,t,) are new constants in X, (considered to be common) intro- 
duced for the corresponding terms f(ti,...,tn), where fori € {1,...,n}, ti 
separates the atom ciRidi, which is entailed by the already deduced atoms. 
(ii) Ap ^ Bo A (H\Hmix) A Hsep \ (ab) is logically equivalent with respect to To 
with the following separated conjunction of ground literals: 
Ao A BoA7(aRb)= Ag A Bo A 7(aRb) A A\{cRd | T>cRd € H\Hmix}A 
AicResq@ A cr@ Rd | Œ > cRegg) A C > cpq@ Rd) E€ Heep }- 
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Proof (Idea). The proof is similar to that of Prop. 5.7 in [24]. (i) and (ii) are 
proved simultaneously by induction on the number of clauses in H. If H = @, it 
is already separated. Otherwise, one can prove that either (Ag A Bo) H aRb - in 
which case we are done — or Ag A Bo entails all the premises of some clause C in 
H. If C contains only constants in Ag or Bo we can remove it from H, add its 
conclusion to Ag A Bo and repeat the procedure with the new Ag A Bo and H. If 
the clause is mixed, we can compute terms t; which separate the premises in C, 
separate C' into an instance C1 of monotonicity and an instance Ch of a clause 
in K, remove C from H, add to Ag A Bo the conclusions of the clauses C1, C2, 
and repeat the procedure with the new Ap A Bo and H. 


Theorem 7. Assume that To is convex and P-interpolating with respect to P C 
Pred, and that T = ToUK is a local extension of To with a set of clauses K which 
only contains combinations of clauses of type (1). Then T is also P-interpolating. 


Proof (Idea). We prove that if A, B are conjunctions of literals and A(,@,, a) A 
B(@,b1,b) Hz aRb where R € P, then there exists a term t containing only the 
constants common to A and B and only function symbols which are shared by A 
and B, such that A(é,@1,a) A B(@, b1,b) Kr aRt ^ tRb. We can restrict w.lo.g. 
to a purified and flattened conjunction of unit clauses Ag ^A Bo A Def A ~(a Rb). 
With the notation used on page 8, by Theorem 4 we have: 

Ag A Bo A Def A ~(aRb) =z iff Ko A Ao N Bo N Con| DA AN Dglo A -(aRb) Ent- 
By Proposition 6 (ii), there exists a set T of (Xo U C)-terms containing only 
constants common to Ao and Bo such that H = Ko A Con[D4 A Dpgļo can be 
separated as described in Proposition 6, Ao ^A Bo A (H\Hmix) A Hsep \ 7aRb 
is logically equivalent w.r.t. Zọ with a separated conjunction of ground literals 
Ao ^A Bo A—aRb, which is therefore unsatisfiable, so Ag A Bo | aRb. From the P- 
interpolation property in Jo, there exists a term containing the shared constants 
such that Ag ^ Bo =m aRt ^ thb. If we now replace all constants c(t, ,...t,) 
introduced in the purification process or in the separation process with the terms 
they denote, we obtain AA B Ez aRt \tRb. 


We obtain the following procedure for P-interpolation if A A B zy aRb: 


Step 1: Preprocess Using locality, flattening and purification we obtain a set 
H ^ Ap A Bo of formulae in the base theory, where H is as in Proposition 6. 

Step 2: A := T. Repeat as long as Ao ^A Bo A A f aRb: 
Let CEH whose premise is entailed by AgA BgAA. 
If C is not mixed, move C to Hsep and add its conclusion to A. 
If C is mixed, compute terms t; which separate the premises in C, and sep- 
arate the clause into an instance Cı of monotonicity and an instance C of 
a clause in K as in the proof of Proposition 6. Remove C from H, and add 
C1, C2 to Hsep and their conclusions to A. 

Step 3: Compute separating term. Compute a separating term for Ao ^A BoA 
AF aRb in Tọ, and construct an interpolant for the extension as explained 
in the proof of Theorem 7. 
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5 Example: Semilattices with Monotone Operators 


We will now analyze <-interpolation properties for theories of semilattices with 
monotone operators. A semilattice (S,M) is set S with a binary operation M 
which is associative, commutative and idempotent. One can equivalently regard 
semilattices as partially ordered sets (S, <), in which infima of finite non-empty 
subsets exist; then a < b iffaNb=a. 

The theory SLat of semilattices can be axiomatized by equations (associativ- 
ity, commutativity and idempotence of M) hence clearly is =-convex: Convexity 
w.r.t. < follows from the fact that x < y iff (x Ny) ~ x. The theory SLat is 
<-interpolating, therefore also ~-interpolating (cf. also [17]; we present the idea 
of the proof since it indicates how the intermediate terms can be computed): 


Lemma 8. The theory SLat of semilattices is <-interpolating. 


Proof (Idea): This is a constructive proof based on the fact that every semilat- 
tice is isomorphic to a sublattice of a power of S2, where So is the 2-element 
semilattice (or, alternatively, that every semilattice is isomorphic to a semilat- 
tice of sets). We prove that if A and B are two conjunctions of literals and 
AA B Estat a < b, where a is a constant occurring in A and b a constant 
occurring in B, then there exists a term containing only common constants in 
A and B such that AA B Esta a < t and AA B sat t < b. We can assume 
without loss of generality that A and B consist only of atoms (for details cf. 
[17]). AA B Estat a < b if and only if the following conjunction of literals in 
propositional logic is unsatisfiable: 


Peines œ Pe, A Pez Poirige es Po N Po. 
Na: PaPe, E1 %eQEA A Py œ Poa gg EB 
Pe, Pes ey <e2 EA Po, > Pop gn <9 EB 
for all e1,€2 subterms in A for all g1, gg subterms in B 
Py AP, 


We obtain an unsatisfiable set of clauses (N4 A Pa) \(NgA7P)) EL, where Na 
and Np are sets of Horn clauses in which each clause contains a positive literal. 
We show that if AA B Estat a < b holds, then for the term 


h= | Ke | A Es_at a < e,e common subterm of A and B} 


we have (i) A Estat a < t, and (ii) AA B srat t < b. 

Clearly, A Estat a < t, thus (i) holds. For proving (ii), we analyze the set 
of clauses obtained by saturating N4 A P, under ordered resolution in which all 
propositional variables occurring in A but not in B are larger than the common 
symbols. It is proved that for deriving the contradiction only the unit clauses Ps, 
where e is a common subterm of A and B and A = a < e, and certain resolvents 
of N4 A P, are needed. The full proof is given in [17] and also in [18]. 

We illustrate the computation of intermediate terms on an example. 
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Example 8. Let A = {a1 < c1, C2 < a2, a2 < c3} and B = {c1 < bı, bı < 
C2, C3 < bg}. It is easy to see that AA B H a, < b2. We can find an intermediate 
term by using the methods described in the proof of Lemma 8: We saturate the 
set of clauses 
Na Pa, = (Pa > Pa) A (Poz > Paz) A (Paz > Pog) A Par 

under ordered resolution, in which the propositional variables Pa, , Pa, are larger 
than P.,, Pea, Peg. This yields the clauses P,, and Pe, —> Pe, containing shared 
propositional variables. (N4 A Pa, ) \(NgA7P,,) is unsatisfiable iff Ng ^A~P,, A 
Pa A (Pe, — Pes) is unsatisfiable. Indeed t = cı is an intermediate term, as 
A= a < cı and AA BE ci < bo. Note that Ng A =P,, A Pa, is satisfiable, so 
BE cı < b2. Moreover, we only need P., — P., in addition to Ng U =P,, to 
derive L, thus AA B = cı < bz and the clause P., — Pe, obtained from N4 is 
really needed for this. 


Semilattices with operators. Let X be a set of unary? function symbols. 
We consider the extension SLats = SLat U Mon( X) of SLat with new function 
symbols in X satisfying the monotonicity axioms Mons = U res, Mon(f), where: 


Mon(f) = Va, y(x < y > f(x) < fy) 
and also extensions SLat U Mon( X) UK, where K is a set of axioms of the form: 
Ve f(x) S g(x) (2) 
Ve,y y< glz) > fly) < hla) (3) 


where f,g,h € X, not necessarily all different. 
Lemma 9. The following extensions satisfy a locality property: 


(i) The theory of semilattices SLat is local. 
(ii) SLat U Mons is a local extension of SLat. 
(iii) SLat U Mons UK is a W-local extension of SLat, where W is the closure 
operator on ground terms defined as follows: 


W(G) = U w'(G), with W°(G) = est(G) (the set of ground terms in G 
120 starting with extension functions), and 
(x) < h()) € K and g(c) € '(G)}uU 
{9(c) ae ara €K and h(c) € W (G)}U 
f(y) < h(2)) € K and g(c) € W(G)}U 
f(y) < h(£)) E€ K and h(c) € W (G)}. 


Proof: (i) follows from a result on the locality of lattices by Skolem [20], or by 
results in [9], since every partial semilattice weakly embeds into a total one. 
(ii) follows from results in [27,28]. (iii) Since the axioms in K are not always 


? We assume that the function symbols are unary to simplify the presentation, and 
because in the applications to description logics we need only unary function symbols. 
All the results can be extended to function symbols of higher arity. 
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linear, we use the locality criterion for non-linear sets of clauses mentioned in 
Theorem 5, and the fact that every semilattice P = (S, N, {f}pex) with partially 
defined monotone operators satisfying the axioms K, and with the property that 
if a variable occurs in two terms g(x), h(x) in a clause in K, then for every s € S, 
g(s) is defined iff h(s) is defined, weakly embeds into a semilattice with totally 
defined operators satisfying K, which was proved in Lemma 4.5 from [26]. 


Given two sets of conjunctions of ground literals A and B over the signature of 
semilattices with operators, we consider the lattice operation M to be interpreted 
and the function symbols in X to be uninterpreted. Let X4 be the function 
symbols in X occurring in A and Xg those occurring in B. We consider the 
following variants for “shared uninterpreted function symbols”: 


— Intersection-sharing: The shared function symbols of A and B are the function 
symbols in X4 N Xp. 

— Ox-sharing: Let Ox(X'4) and Ok(Xpg) be defined as explained in Exam- 
ple 1. The Ox-shared function symbols are the function symbols in Ox«(274)N 
Ox (2p). 


Theorem 10. For every set K containing clauses of the form (2) and (3) above, 
the theory SLat U Mons UK of semilattices with monotone operators satisfying 
axioms K is <-interpolating with the notion of Ox-sharing for uninterpreted 
function symbols. 


Proof: The clauses of type (2) and (3) satisfy the conditions in the statement 
of Proposition 6 and Theorem 7. The result is therefore a consequence of the 
fact that SLat is convex and {%,<}-interpolating, and of Proposition 6 and 
Theorem 7. 


We illustrate the way Theorem 4, Proposition 6 and Theorem 7 and the algorithm 
in Sect. 4 can be used for computing intermediate terms below: 


Example 4. Consider the extension SLO = SLat U Mons U Mon, U K of SLat 
with two monotone functions f,g satisfying: K = {y < g(x) > f(y) < g(a)}. 
Consider the following conjunctions of atoms: A := d < g(a) ^a <c ^ g(c) <a 
and B:=b<dAb< f(b). It can be checked that AA BE b <a. 

To obtain a separating term we proceed as follows: By the definition of SLO, 
AAB Eszo b < a iff SLat A Mong A Mong AKA AA BA-7(b < a) EL. By 
Theorem 4, this is the case iff SLat A (Mon A Mon, A K)[W(G)] A G EL, where 


G=ANBA~(b < a), est(G) = {9(a), 9(c), f(0)} and Y(G) = {g(a), g(c), f()}. 
— Mon;[¥(G)] = {b < b — f(b) < f(b)} (redundant). 

~ Mon,|[¥(G)] = {di < d2 > g(d1) < g(d2) | di, d2 E {a,c}}. 

~ KW(G)] = {b < g(a) > f(b) < g(a),b < gc) > f(b) < gO}. 

Step 1: We purify (Mons A Mong AK) [Y¥(G)] AG, by introducing constants a, for 
g(a), cı for g(c) and bı for f(b) and obtain the formula Def A Ag ^ Bo \Mono A Ko: 
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Def Ao A Bo Mono A Ko 
Da:a, X g(a) Aa ~ g(c)|Ao:d<ap,Aa<cAcg <a Monga <c> a <c 
Dpg:bı ~ f(b) Bo:b< d Ab< bı K mix b < ai > bı <a 

<€ {<,>} b<a>b sc 


Step 2. A := T. Find clauses in Mongo^Ko with premises entailed by Ao ^A Bo AA. 


C =a < c> a < c: C is not mixed. Since Ao A Bo Estat a < c, Ao A BoA (a < 
c —> a1 < cı) is equivalent to Ag A Bo Aa < cy. Let A := {a1 < cy}. 

C = b < a — bı < ay: C is mixed. Since Ag A Bo Aa, < cy = b < ay we 
find a separating term. For this we use the method described in the proof of 
Lemma 8. We consider the encoding Ng A P, := (P, => Pa) A (P, > Pp) A P. 
Using ordered resolution with an ordering in which P,,P,, > Pa we derive 
the unit clauses Py and P,,. Since d is the only shared constant, t = d is the 
separating term. Thus, Ao A Bo Aa, < c =b<d A d< ay. We now can 
separate the instance b < a, — bı < a, of the clause in K by introducing 
a new shared constant dı as a name for f(d) and replacing the clause, as 
described in the algorithm at the end of Sect. 4, with the conjunction of 


(1) b< d— bı < dı and 
(2) d< ai > dı <a 


((1) is an instance of a monotonicity axiom, (2) is another instance of K), and 
Ao A Bo Nay < c A (b < d= bı < dy) A (d < ay —> dy < ay) is equivalent to 
Ap A Bo A ay <a Abı < di Ad, < a. Let A:= AA by < di Adı < ay. 

Step 3: The last conjunction entails b < a. To compute a separating term, we 
again use Lemma 8. We consider the encoding N} A^ P, := (P, > Pa) A (P, > 
P, ) A (Pp, > Pa,) A P, of the B-part of the conjunction, Bo A bı < di. Using 
ordered resolution with an ordering in which P,, P,, > Pa, Pa, we derive the unit 
clauses Py, P., and P4,. Since d,d; are the shared constants, t = d N dı is the 
separating term. (It can be seen that already d is a separating term.) 


If K contains axioms of type (3) then the theory of semilattices with oper- 
ators is not <-interpolating when sharing is regarded as intersection-sharing. 
Indeed, assume that for every K containing axioms of type (3), SLaty(K) is <- 
interpolating w.r.t. intersection-sharing. Then it would also be *-interpolating 
w.r.t. intersection-sharing. This cannot be the case, as can be seen from the 
following example. 


Example 5. Consider the theory SLaty(X) of semilattices with monotone oper- 
ators f,g satisfying the axioms K = {x < g(y) — f(x) < g(y)}, and let C be 
a set of constants containing constants a,b, d,e. We show that this theory does 
not have the »'s-Beth-definability property, where Xs = {g, e}. 

Consider the conjunction of literals A = (a < f(e)) A (e < g(b)) A (g(8) < a). 
One can prove that a is implicitly definable w.r.t. {g,e} by proving, using the 
hierarchical reduction for local theory extensions in Theorem 4, that: 


(aSf(e))A(esg(0))A(9(b)Sa)A(a'Sf'(e)) ACES) Agoa") Esasen OO 
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We show that a is not explicitly definable w.r.t. {g, e}. If there exists a term t 
containing only g and e such that (a<f(e)) A (e<g(b)) A (g(b)<a@) FE stats (xc) at, 
then the interpretations of a and t are equal in every model of SLaty(K) which is 
a model of A. We show that this is not the case. Let S = ({a,e,b, d},N, f, g) be 
the semilattice where d < e < a, d < band ab = erb = d, and f(a) = f(e) =a, 
f(b) = f(d) = d, g(a) = g(e) = g(d) = d and g(b) =a. Then S satisfies A, f and 
g are monotone, and S is a model of K: Assume that x < g(y). If y € {a,e, d} 
then g(y) = d so x = d, and f(d) = d < g(y). If y = b then g(b) = a, sox 
can be a,e or d, and f(a) = f(e) =a, f(d) = d, so f(x) < g(b) =a. A term 
t containing only g and e can be e or can contain occurrences of g. If t = e 
then the interpretation of t in S is not a. If t contains occurrences of g it can be 
proven that the interpretation of t in S is d, i.e. is again different from a. 

Thus 7 = SLats(K) does not have the Beth definability property w.r.t. Xg, 
hence, by Theorem 3, T UT’ = SLaty 4(K) U SLaty g(K’) = SLat yp g(K UK’), 
where K’ = {y < g(x) — f’(y) < g(x)}, does not have the ~-interpolation 
property w.r.t. intersection-sharing, hence it does not have the <-interpolation 
property w.r.t. intersection-sharing. (By Theorem 10 and Theorem 3, 7 has 
the Ox (5’s)-Beth definability property, where Ox (Xs) = {f,g,e}. Indeed, then 
Ak ae f(e),) 


6 Applications to E£ and ELt-Subsumption 


We now explain how these results can be used in the study of the description 
logics E£ and EL". In any description logic a set Nc of concept names and a 
set Np of roles is assumed to be given. Concept descriptions can be defined with 
the help of a set of concept constructors. The available constructors determine 
the expressive power of a description logic. If we only allow intersection and 
existential restriction as concept constructors, we obtain the description logic 
EL [1], a logic used in terminological reasoning in medicine [29,30]. The table 
below shows the constructor names used in E£ and their semantics. 


Constructor name Syntax | Semantics 
conjunction C11 C2 | CF act 
existential restriction | 4r.C’ {x | dy((x,y) € r7 and y € C*)} 


The semantics is given by interpretations Z = (A,-7), where C? C A and 
r? C A? for every C € No, r € Np. The extension of -7 to concept descriptions is 
inductively defined using the semantics of the constructors. In [2,3], the extension 
EL* of EL with role inclusion axioms is studied. 

A TBox (or terminology) is a finite set consisting of general concept inclusions 
(GCI) of the form C E D, where C and D are concept descriptions. A CBox 
consists of a TBox and a set of role inclusions of the form r1 0---or, E s, so we 
view CBoxes as unions GCIUR of a set GCI of general concept inclusions and a 
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set R of role inclusions of the form r10-- -orp E s, with n>1.° An interpretation 
T is a model of the CBorC = GCI UR if it is a model of GCI, i.e., C7 CD? for 
every CED € GCI, and satisfies all role inclusions in C, i.e., rf 0---o rz C st 
for all ry o--- or, C s E€ R. If C is a CBox and C1, C2 are concept descriptions, 
then C = C1 E Co if and only if GF ie cr for every model Z of C. 

In [23] we studied the link between TBox subsumption in E£ and uniform 
word problems in the corresponding classes of semilattices with monotone func- 
tions. In [25], we showed that these results naturally extend to CBoxes and to 
the description logic E£+. When defining the semantics of EL or EL* with role 
names Np we use a class of [1-semilattices with monotone operators of the form 
SLats, where X = {f, | r € Ne}. Every concept description C can be repre- 
sented as a term C; the encoding is inductively defined: Every concept name 
C € No is regarded as a constant C = C. We define C1 N Cy := Ci N Cg and 
arC = f,(C). If R is a set of role inclusions of the form r E s and rı org E s, 


let K be the set of all axioms of the form: 
Ve (fe) < fs(x)) forall rloseER 


Vr (fri (fro(©)) < fs(a)) for al rror Ds ER 


Theorem 11 ([25]). Assume that the only concept constructors are intersection 
and existential restriction. Then for all concept descriptions Dı, Də and every 
EL* CBox C=GCIUR - where R consists of role inclusions of the form r E s 
and rı ors E s — with concept names No = {C1,..., Cn} and set of roles Nr: 


CEDIED: if (Mocnego; OSD) Estatzxy DisDa, 


where X is associated with Nr and K with R as described above. 


In [8,31] the following notion of interpolation which we call C-interpolation is 
defined: A description logic has the C-interpolation property if for any CBoxes 
Ca = GCIAUR «4, Cp = GCI gpUR p and any concept descriptions C, D such that 
Ca UCpg ECE D there exists a concept description T containing only concept 
and role symbols “shared” by {C4,C} and {Cg, D} such that Ca UCp EFCET 
and Ca UCpg ET E D. By Theorem 11, CAUCg ECL Dif AAB FSLat s (K) 
C<D, where A = Mo,ceseaor, C1 ©: B= Me,coseectg C1S C2: and K = 
Ka U Kp, the union of the axioms associated with the set inclusions R4 resp. 
Rg. By Theorem 10, there exists a term containing only constants and function 
symbols Ox ,uk,-shared by A and B such that AAB slat» (K4UKB) CK<tAt<D. 
From t we can construct a concept description T containing only concept names 
and roles shared by C4 and Cg, and by Theorem 11, CaNCpECETAT CD. 
Therefore, the C-interpolation problem studied for description logics in [8,31] can 
be expressed in the case of EL and EL" as a <-interpolation problem in the class 
of semilattices with operators, and the hierarchical method for <-interpolation 
can be used in this case. We distinguish between intersection-sharing and Or- 
sharing, where OR is the analogon of Ox where XK is the translation of R. 


3 Tt can be shown that it is sufficient to consider role inclusions of the form r E s or 


rı org C s, where r,s,71, 72 are role names [3]. 
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Corollary 12. EL and EL* have the C-interpolation property w.r.t. Or- 
sharing. EL% with role inclusions of the form rı or2 E s does not have C- 
interpolation w.r.t. intersection-sharing. 


7 Conclusions and Future Work 


In this paper we gave a hierarchical method for P-interpolation in certain classes 
of local theory extensions 7ọ C Jọ UK. We used these results for proving <- 
interpolation in classes of semilattices with monotone operators satisfying addi- 
tional clauses K with a suitable notion of Ox-sharing we defined. We defined 
a form of Beth definability w.r.t. a subsignature Xs and used it to show that 
the class of semilattices with operators under consideration does not have the 
<-interpolation property if only the common function symbols and constants are 
considered to be “shared”. We discussed how these results can be used for the 
study of interpolation in E£ and EL*. 

The ideas were implemented in a prototype implementation* for the theory 
of semilattices with operators satisfying axioms of type (1) considered in this 
paper. The program is written in Python and uses Z3 [7] and SPASS [33] as 
external provers. The program implements Steps 1-3 in the algorithm presented 
at the end of Sect. 4 with the following optimization: In Step 1 after instantiation 
and purification, in order to reduce the size of the set of instances of axioms to 
be considered, an unsatisfiable core is computed with Z3. The program separates 
the mixed instances by computing intermediate terms for their premises using 
Theorem 8 and Proposition 6; for applying ordered resolution the prover SPASS 
is used. In Step 3, the intermediate term T for C < D is computed using the 
method described in Theorem 8, again using SPASS. 

For the use for interpolation in E£ and EL*, the CBoxes C4 and Cg and 
the subsumption C E D are given as an input. A minimal subset of C4 U CB 
is computed from which C E D can be derived. (The user can choose between 
a precise translation to SPASS or a propositional translation to Z3 which is 
not always precise, but turned out to be a good approximation. Standard imple- 
mentations available for computing justifications of entailments from description 
logic ontologies could be used as well.) The problem is then translated into a 
problem for <-interpolation in semilattices with operators. After computing the 
interpolating term, the result is expressed in the syntax of description logics. 

In future work we will explore other application areas of these results, both to 
classes of non-classical logics and to theories relevant in the verification. We plan 
to extend the implementation with possibilities of choosing the base theory and 
the methods for P-interpolation in the base theory. We will further investigate 
the links with Beth definability and possibilities of using Beth definability for 
computing explicit definitions for implicitly definable terms — and analyze the 
applicability of such results in description logics but also in verification. 


“The implementation and some tests can be found here: https: //userpages.uni- 
koblenz.de/~sofronie/p-interpolation-and-el/. 
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Abstract. Higher-order logic HOL offers a very simple syntax and semantics 
for representing and reasoning about typed data structures. But its type system 
lacks advanced features where types may depend on terms. Dependent type the- 
ory offers such a rich type system, but has rather substantial conceptual differ- 
ences to HOL, as well as comparatively poor proof automation support. 

We introduce a dependently-typed extension DHOL of HOL that retains the 
style and conceptual framework of HOL. Moreover, we build a translation from 
DHOL to HOL and implement it as a preprocessor to a HOL theorem prover, 
thereby obtaining a theorem prover for DHOL. 


1 Introduction and Related Work 


Theorem proving in higher-order logic (HOL) [5,11] has been a long-running research 
strand producing multiple mature interactive provers [10, 13, 17] and automated provers 
[2,4,23]. Similarly, many, mostly interactive, theorem provers are available for various 
versions of dependent type theory (DTT) [7,9,15,18]. However, it is (maybe surpris- 
ingly) difficult to develop theorem provers for dependently-typed higher-order logic 
(DHOL). 


In this paper, we use HOL to refer to a version of Church’s simply-typed A-calculus 
with a base type bool for Booleans, simple function types —, and equality =4: A —> 
A — bool. This already suffices to define the usual logical quantifiers and connectives. ! 
Intuitively, it is straightforward to develop DHOL accordingly on top of the depen- 
dently-typed A-calculus, which uses a dependent function type IIx: A. B instead of —. 
However, several subtleties arise that seem deceptively minor at first but end up present- 
ing fundamental theoretical issues. They come up already in the elementary expression 
x =4 y => f(x) =g) f(y) for some dependent function f : Tx:A. B(x). 


Firstly, the equality f(x) =g) f(y) is not even well-typed because the terms f(x) : 
B(x) and f(y) : B(y) do not have the same type. Intuitively, it is obvious that the type 
system can (and maybe should) be adjusted so that the equality x =4 y between terms 


l We do not assume a choice operator or the axiom of infinity. 


© The Author(s) 2023 
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carries over to an equality B(x) = B(y) between types.” However, this means that the 
undecidability of equality leaks into the equality of types and thus into type-checking. 


While some interactive provers successfully use undecidable type systems [6, 16], most 
formal systems for DTT commit to keeping type-checking decidable. The typical app- 
roach goes back to Martin-Lof type theory [14] and the calculus of constructions [8] 
and uses two separate equality relations, a decidable meta-level equality for use in the 
type-checker and a stronger undecidable one subject to theorem proving. Moreover, 
it favors the propositions-as-types representation and deemphasizes or omits a type of 
classical Booleans. This approach has been studied extensively [7,9, 15] and is not the 
subject of this paper. 


Instead, our motivation is to retain a single equality relation and classical Booleans. 
This is arguably more intuitive to users, especially to those outside the DTT community 
such as typical HOL users or mathematicians, and it is certainly much closer to the 
logics of the strongest available ATP systems. This means we have to pay the price of 
undecidable type-checking. The current paper was prompted by the observation that 
this price may be acceptable for two reasons: 


1. If our ultimate interest is theorem proving, undecidability comes up anyway. 
Indeed, it is plausible that the cost of showing the well-typedness of a conjecture 
will be negligible compared to the cost of proving it. 


2. As the strength of ATPs for HOL increases, the practical drawbacks of undecidable 
type-checking decrease, which indicates revisiting the trade-off from time to time. 
Indeed, if we position DHOL close to an existing HOL ATP, it is plausible that the 
price will, in practice, be affordable. 


Secondly, even if we add a rule like “if + x =, y, then + B(x) = B(y)” to our type 
system, the above expression is still not well-typed: Above, the equality x =, y on the 
left of = is needed to show the well-typedness of the equality f(x) =g(x) f(y) on the 
right. This intertwines theorem proving and type-checking even further. Concretely, 
we need a dependent implication, where the first argument is assumed to hold while 
checking the well-typedness of the second one. Formally, this means that to show F 
F = G: bool, we require F F : bool and F + G : bool. Similarly, we need a dependent 
conjunction. And if we are classical, we may also opt to add a dependent disjunction 
F V G, where =F is assumed in G. Naturally, dependent conjunction and disjunction are 
not commutative anymore. This may feel disruptive, but similar behavior of connectives 
is well-known from short-circuit evaluation in programming languages. 


The meta-logical properties of dependent connectives are straightforward. However, 
interestingly, these connectives can no longer be defined from just equality. At least one 
of them (we will choose dependent implication) must be taken as an additional primitive 
in DHOL along with =,. 


Finally, the above generalizations require a notion of DHOL-contexts that is more com- 
plex than for HOL. HOL-contexts can be stratified into (a) a set of variable declarations 


2 Note that while term equality =, is a bool-valued connective, type equality = is not. Instead, 
in HOL, = is a judgment at the same level as the typing judgment t : A. 
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x; : Aj, and (b) a set of logical assumptions F possibly using the variables x;. Moreover, 
the former are often not explicitly listed at all and instead inferred from the remainder of 
the sequent. But in DHOL, the well-formedness of an A; may now depend on previous 
logical assumptions. To linearize this inter-dependency, DHOL contexts must consist 
of a single list alternating between variable declarations and assumptions. 


Contribution. Our contribution is twofold. Firstly, we introduce a new logic DHOL 
designed along the lines described above. Moreover, we further extend DHOL with 
predicate subtypes A|, for a predicate p : A — bool on the type A. Besides dependent 
types, these constitute a second important source of terms occurring in types. Because 
they also make typing undecidable, they are often avoided. The most prominent excep- 
tion is PVS [16], whose kernel essentially arises by adding predicate subtypes to HOL. 
In current HOL ITPs going back to [10], their use is usually restricted to the subtype 
definition principle: here a definition b := A|, may occur on toplevel and is elaborated 
into a fresh type b that is axiomatized to mimic the subtype A|,. Because we are com- 
mitted to undecidable typing anyway, predicate subtypes fit naturally into our approach. 


Secondly, we develop and implement a sound and complete translation of DHOL into 
HOL. This setup allows the use of DHOL as the expressive user-facing language and 
HOL as the internal theorem-proving language. We position our implementation close 
to an existing HOL ATP, namely the LEO-III system. From the LEO-III perspective, 
DHOL serves as an additional input language that is translated into HOL by an external 
logic embedding tool [21,22] in the LEO-III ecosystem. Because LEO-III already sup- 
ports such embeddings and because the TPTP syntax [24] foresees the use of dependent 
types in ATPs and provides syntax for them (albeit without a normative semantics), we 
were able to implement the translation with no disruptions to existing workflows. 


The general idea of our translation of dependent into simple type theory is not new [3]. 
In that work, Martin-Léf-style dependent type theory is translated into Gordon’s HOL 
ITP [10]. This work differs critically from ours because it uses DTT in propositions- 
as-types style. Our work builds DHOL with classical Booleans and equality predicate, 
which makes the task of proving the translation sound and complete very different. 
Moreover, their work targets an interactive prover while ours targets automated ones. 


Overview. In Sect.2 we recap the HOL logic. In Sect.3 we extend it to DHOL and 
define our translation from DHOL to HOL. In Sect. 4 we add subtyping and predicate 
subtypes. In Sect.5 we prove the soundness and completeness of the translation. In 
Sect. 6 we describe how to use our translation and a HOL ATP to implement a theorem 
prover for DHOL. 


2 Preliminaries: Higher-Order Logic 


We introduce the syntax and rules of HOL. Our definitions are standard except that we 
tweak a few details in order to later present the extension to DHOL more succinctly. 
We use the following grammar for HOL: 
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T n= o|T,a:tp|T,c:A|T,c:F theories 
T z= .|T,x:A|T,x: F contexts 
A,B := a|A—B|bool types 


St, f,F,G2= clx|aAx:A.t|ft|s=4t|F>G_ terms 


A theory T is a list of base type declarations a : tp, typed constant declarations c : A, 
and named axioms c: F asserting the formula F. A context T has the same form except 
that no type variables are allowed. It is not strictly necessary to use named axioms and 
assumptions, but it makes our extensions to DHOL later on simpler. We write o and . 
for the empty theory and context, respectively. At this point, it is possible to normalize 
contexts into a set of variable declarations followed by a set of assumptions because 
the well-formedness of a type A can never depend on a variable or an assumption. But 
that property will change when going to DHOL, which is why we allow I to alternate 
between variables and assumptions. 


Types A are either user-declared types a, the built-in base type bool, or function types 
A — B. Terms are constants c, variables x, A-abstractions Ax:A. t, function applications 
f t, or obtained from the built-in bool-valued connectives =4 or =. As usual [1], this 
suffices to define all the usual quantifiers and connectives true, false, =, A, V, V and J. 
This includes =, but we make it a primitive here because we will change it in DHOL. 
As usual, E[*/:] denotes the capture-avoiding substitution of the variable x with the term 
t within expression E. 


The type and proof system uses the judgments given below. Note that we need a meta- 
level judgment for the equality of types because = is not a bool-valued connective. On 
the contrary, the equality of terms | s =, t is a special case of the validity judgment 
F- F. In HOL, = is trivial, and the judgment is redundant. But we include it here already 
because it will become non-trivial in DHOL. 


Name Judgment | Intuition 

theories - T Thy T is well-formed theory 

contexts Hr T Ctx T is well-formed context 

types TFHFrAtp |A is well-formed type 

typing TFyt:A_ |tisa well-formed term of type well-formed type A 
validity Thr F well-formed Boolean F is provable 

equality of types | IT Fr A = B | well-formed types A and B are equal 


The rules are given in Fig. 1. We assume that all names in a theory or a context are 
unique without making that explicit in the rules. Following common practice, we further 
assume that HOL types are non-empty. 
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Theories and contexts: 
+ T Thy Fr A tp Fr F : bool 
= o Thy - T, atp Thy + T, c: AThy + T, c: F Thy 
+ T Thy TF, A tp [Fy F : bool 
Fy, . Ctx Fr T, x: A Ctx Fr T, x: F Ctx 
Lookup in theory and context: 


a:tpinT Fr T Ctx c:A’inT Tb,A SA c: FinT Hr T Ctx 
T+, atp TH,c:A T+, F 


x:A'inE Tb, A'S A x: Fint +,TCtx 
TR,px:A TF, F 


Well-formedness and equality of types: 


Fz T Ctx Tt, Atp TH; Btp TH; Atp FrA= A Fr B= B’ 
T Hy bool tp TH; A> Btp Tr; A=A rH A> B= A'> BP 
Typing: 

T,x:At,t: B Thy f:A>B Ttyt:A Trs: A PTHrt:A 

TH, (x:A.th: A> B Th, ft: B T Fr s =, t : bool 


Term equality: congruence, reflexivity, symmetry, f, 4 


THASA T,x:AFrt= t DThrt= t DH f =a f! 


Th, Ax: A.t Sjap Ax: Alt’ Th, ft =, f't’ 
Tr,t:a [t,t =, 5 T Fz (âx: A. s)t: B rt: A>B x notin I 
Thkrpt=,t THs =t TH (Ax: A. s)t =, sA T Fr t =4,, Ax: A. tx 


Rules for implication: 


TH; F: bool THG: bool Tr F: bool T, x: FFHr G rH F>G rF 


TH, F >G : bool TH, F>G TH, G 


Congruence for validity, Boolean extensionality, and non-emptiness of types: 


Th, F =, F Dh, F’ TF, ptrue Tt,pfalse Fr, F: boo T,x: At, F 
T+, F T, x : bool Fy px Th, F 


Fig. 1. HOL Rules 
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3 Dependent Function Types 


3.1 Language 


We have carefully defined HOL in such a way that only a few surgical changes are 
needed to define DHOL. A consolidated summary of DHOL is given in Appendix A.2 
in the extended preprint [20]. The grammar is as follows with unchanged parts shaded 
out: 


T n= o|T,a:(Ilx:A.)*tp|T,c:A|T,c:F theories 
T z= .[T,x:A|T,ass: F contexts 
A,B n= aty...t, | T:A. B| bool types 
St, f,F,G::= c|x|Ax:A.t|ft|s=4t|F >G terms 


Concretely, base types a may now take term arguments and simple function types 
A — B are replaced with dependent function types IIx: A. B. As usual we will retain 
the notation A — B for the latter if x does not occur free in B. DHOL is a conservative 
extension of HOL, and we recover HOL as the fragment of DHOL in which all base 
types a have arity 0. 


Example 1 (Category Theory). As a running example, we formalize the theory of a 
category in DHOL. It declares the base type obj for objects and the dependent base 
type mor a b for morphisms. Further it declares the constants id and comp for identity 
and composition, and the axioms for neutrality. We omit the associativity axiom for 
brevity. 


obj :tp 
mor :IIx,y:obj. tp 
id :Ila:obj. mor a a 
comp :Ila,b,c:obj. mor a b — mor b c — mor a c 
neutL :Vx,y : obj.Vm:mor x y. moidy =mor x y M 


neutR :Vx,y: obj. Vm : mor x y. idy Oom =por x y M 


Here we use a few intuitive notational simplifications such as writing Ix,y:obj. for 
binding two variables of the same type. We also use the notations id, for id x and hog 
for comp ____ g h where the _ denote inferable arguments of type obj. 


The judgments stay the same and we only make minor changes to the rules, which we 
explain in the sequel. Firstly we replace all rules for — with the ones for IT: 


ThrAtp T,x:A}rBtp THrpA=A! T,x:A}rB=B' 


TF yTlx:A. B tp Tr HrIx:A. B=Ix: A’. B' 
T,x:AFrt:B Try f:Ikx:4.B TFrt:A 
T Fr (Ax:A. t) :IIx:A. B Ter ft BEA] 


ThrA =A'T,x:Al T t =B t' Tl yrt =a to TI Tf =A. B f 
Tbr Ax:A.t =TLrA. B Ax:A’. t! Tr fa =B f' t' 
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T Fr t:Iix:A. B 
T Fr t =m. B AX:A.t x 


Then we replace the rules for declaring, using, and equating base types with the ones 
where base types are applied to arguments: 
Fr x1 : A1, ..-,Xn : An Ctx 
ET, ax, :Aq. ...Ixn:An. tp Thy 


Hr T Ctx a :Ix1:41. ...Mxn:An. tp in T 
Thert,:Ay ... THT tn : Ana] -e P-a] 
TrFrat ...t,tp 
Hr T Ctx a :Ix1:41. ... Ten: An. tp in T 
T Fr S1 =A, Oy nce T Fr Sn SAn [t/a]. i a] tn 


T Fras ... S1=4 ti ...ty 


The last of these is the critical rule via which term equality leaks into type equality. 
Thus, typing of expressions may now depend on equality assumptions and thus typing 
becomes undecidable. 


Example 2 (Undecidability of Typing). Continuing Example 1, consider terms F f : 
mor u v and H g : mor Vv w for terms H u,v, v',w : obj. Then F go f : mor u w holds iff 
+ f : mor u v', which holds iff + v =o»; v’. Depending on the axioms present, this may 
be arbitrarily difficult to prove. 


Finally, we modify the rule for the non-emptiness of types: we allow the existence of 
empty dependent types and only require that for each HOL type in the image of the 
translation there exists one non-empty DHOL type translated to it (rather than requiring 
all dependent types translated to it to be non-empty). And we replace the typing rule for 
implication with the dependent one. The proof rules for implications are unchanged. 


TFrF:bool T,x: Flr G: bool 
T Fr F > G: bool 


Example 3 (Dependent Implication). Continuing Example 1, consider the formula 
x:obj, y: obj Fx =j Y > idy =por x x idy : bool 


which expresses that equal objects have equal identity morphisms. It is easy to prove. 
But it is only well-typed because the typing rule for dependent implication allows using 
X =op; y while type-checking idy =mor xx idy : bool, which requires deriving id, : 
mor x x and thus mor y y = mor x x. 
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All the usual connectives and quantifiers can be defined in any of the usual ways now. 
However, the details matter for the dependent versions of the connectives. In particular, 
we choose F \G:= ~(F = 7G) and F V G := =F = Gin order to obtain the dependent 
versions of conjunction and disjunction, in which the well-formedness of G may depend 
on the truth or falsity of F, respectively. 


3.2 Translation 


We define a translation function X +> X that maps any DHOL-syntax X to HOL-syntax. 
Its intuition is to erase type dependencies by translating all types atı ..., tn to a and 
replacing every II with —. To recover the information of the erased dependencies, we 
additionally define a partial equivalence relation (PER) A* on A for every DHOL-type 
A. 


In general, a PER r on type U is a symmetric and transitive relation on U. This is equiv- 
alent to r being an equivalence relation on a subtype of U. The intuitive meaning of our 
translation is that the DHOL-type A corresponds in HOL to the quotient of the appro- 
priate subtype of A by the equivalence A*. In particular, the predicate A* t t captures 
whether f represents a term of type A. More formally, the correspondence is: 


DHOL | HOL 
type A | type A and PER A* : A — A — bool 
term ż : A | term 7 : A satisfying A* T T 


Definition 1 (Translation). We translate DHOL-syntax by induction on the grammar. 
Theories and contexts are translated declaration-wise: 


where D is a list of declarations. 


The translation a : Ilx; :A;. ...IIxn:An. tp of a base type declaration is given by 


a:tp, a*:A;—...7A,—-a—a-— bool 


Oper. VX 1A]. ...VXn Ån. Vu, V:a. a X1 ... Xn UV = U =a V 


Thus, a is translated to a base type of the same name without arguments and a trivial 
PER for every argument tuple. Intuitively, a* ti ... tn u u defines the subtype of the 
HOL-type a corresponding to the DHOL-type a tı ... tn. 


Constant and variable declarations are translated by adding the assumptions that they 
are in the PER of their type, and axioms and assumptions are translated straightfor- 
wardly: 


cA := c: Å, č :A* cc xA = xA, X Aï xx 
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The cases of A and A* for types A are: 


ati... i= a (ati... n) st := at ...t, st 
Tlx:A.B := A—> B (Tx:A. B)“ f g := Yx,y:A. A* xy => B* (f x) (gy) 
bool := bool bool* st := 5 =bþool Í 


Finally, the cases for terms are straightforward except for, crucially, translating equality 
to the respective PER: 


al 
II 
= 
> 
= 
> 
II 
> 
x 
>| 
~i 
II 
Sl 
~i 


Ci=Cc 


Example 4 (Translating Derived Connectives). If we define true, false, = as usual in 
HOL and use the definition for dependent conjunction from above, it is straightforward 
to show that all DHOL-connectives are translated to their HOL-counterparts. For exam- 
ple, we have (up to logical equivalence in HOL) that F A G = F AG. 


We also define the quantifiers in the usual way, e.g., using Vx: A.F(x) := Ax: 
A. F(x) =4-sboo! 4x:A. true. Then applying our translation yields 


Vx: A.F(x) = (A — bool)* Ax: A.F (x) Ax: A.true 


=Vx,y:A.A* x y > bool* F(x) true 


This looks clunky, but (because A* is a PER as shown in Theorem 1) is equivalent to 
Vx : A.A* x x = F(x). Thus, DHOL-Y is translated to HOL-V relativized using A* x x. 
The corresponding rule 3x : A.F (x) = 3x : A.A* x x^ F(x) can be shown accordingly. 


Example 5 (Categories in HOL). We give a fragment of the translation of Example 1: 


obj : tp obj* : obj — obj — bool 
mor : tp mor* : obj — obj — mor — mor — bool 
id:obj mor id*:Vx,y: obj.obj* x y= mor* xx (id x) (id y) 
comp : obj — obj — obj — mor — mor — mor 
neutL : Vx: obj.obj* xx = Vy: obj.obj* y y => 
Ym :mor.mor* x y m m = mor* x y (comp xx y (idx) m) m 


Here, for brevity, we have omitted obj pgg, mor pgr, and comp* and have already used 
the translation rule for Y from Example 4. The result is structurally close to what a 
native formalization of categories in HOL would look like, but somewhat clunkier. 


Theorem Proving in Dependently-Typed Higher-Order Logic 447 


Typing rules for predicate subtypes: 


TR; p : Ix: A.bo è T Hrt:A Trp pt Prt: Al, 
T Hy Al, tp Trt: Al, TH, pt 


Congruence and variance rule for predicate subtypes: 


THpA=A T Hr P =p boo P Tk, A<: A T,x:AFrpx>px 
TF; Al, = Aly T Fr Al, <: Aly 


Rules that relate A and A|, : 


PEAS A TH, Atp TH, Atp 
Tr, Al, <: A) TH, ASAI I Fr Aliwa. me = A 


Àx:A. true 


Variance rules for other DHOL types: 


TRASA TH, Al <: A DT, x: A’ Rp B <: B' 
TF, A <: A’ TF, Ix: A.B <: Tx: A’. B’ 


Rules for normalizing certain subtypes: 
Tk, Atp T,x:Ak, Btp T,x:At,p: Ty: B. bool 
Db, Tx: A. (B|,) = Mx: A. Bylapyeca pg 


Tk, Atp Tepp: Tx: A.bool THp q : Tx: (Al,). bool 
T Fr Ally = Aleapau 


Fig. 2. Additional Rules for Predicate Subtypes 


4 Predicate Subtypes 


To add predicate subtypes, we extend the grammar with the production A ::= Alr. 
No new productions for terms are needed because the inhabitants of A|p use the same 
syntax as those of A. 


Example 6 (Isomorphisms). We continue Example | and use predicate subtypes to 
write the type isomorphisms u of automorphisms on u as a subtype of mor u u. We 
can define isomorphisms u := (mor u u)|p where the predicate p is given by 


Am:mor u u.Ji : mor u u. (iom =mor u u idy) A (moi =mor uu idu) 


Adding subtyping requires a few extensions to our type system. First we add a judg- 
ment T -r A <: B and replace the lookup rules for variables and constants with their 
subtyping-aware variants: 


c:AlinT tr Ai <:A x:A'inr ThryA' <:A 
Ter ciA Frx:A 
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Then we add the rules given in Fig. 2. These induce an algorithm for deciding sub- 
typing relative to an oracle for the undecidable validity judgment. The latter enters the 
algorithm when two predicate subtypes are compared. Note that the type-equality rule 
for A|p |q uses a dependent conjunction. 


The resulting system is a conservative extension of the variants of HOL and DHOL 
without subtyping: we recover these systems as the fragments that do not use A|,. In 
particular, in that case A <: B is trivial and holds iff A = B holds. 


Finally, we extend our translation by adding the cases for predicate subtypes: 


Definition 2 (Translation). We extend Definition 1 with 


Alp =A (Alp) st := A* StADsApt 


5 Soundness and Completeness 


Now we establish that our translation is faithful, i.e. sound and complete. We will use 
the terms sound and complete from the perspective of using a HOL-ATP for theorem 
proving in DHOL, e.g., sound means if F is a HOL-theorem, then F is a DHOL- 
theorem, and complete is the dual.* 


The completeness theorem states that our translation preserves all DHOL-judgments. 
Moreover, the theorem statement clarifies the intuition behind the translations invari- 
ants: 


Theorem 1 (Completeness). We have 


if in DHOL then in HOL 

+ T Thy + T Thy 

-r T Ctx -zT Ctx 

TrrAtp TbHyAtp andT}74A*:A-— A -— bool and A* is PER 
TFrA =B _ T F7A = B andT, x,y: AF7 A* xy =poo B* xy 
TFrA <: BItyA =B andT, x,y: AF7 A* xy => Bi xy 
Tkrt:A Dbrt:A andl FyA*tt 

ThrF TtyF 


Additionally the substitution lemma holds, i.e., 


T,x:Abrt:BandTtu:A implies T F7 tẸ/4] =p t/a] 


Proof. The proof proceeds by induction and can be found in Appendix B of the 
extended preprint [20]. 


3 If, however, we think of our translation as an interpretation function that maps syntax to seman- 
tics, we could also justify swapping the names of the theorems. 
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The reverse direction is much trickier. To understand why, we look at two canaries in 
the coal mine that we have used to reject multiple intuitive but untrue conjectures: 


Example 7 (Non-Injectivity of the Translation). Continuing Example 1, assume terms 
u,v : obj and consider the identify functions J, := Af: mor u u.f and I := Àf : 
mor v v.f. Both are translated to the same HOL-term 7, = J, = Af : mor.f (because 
[, and J, only differ in the type indices, which are erased by our translation). 


Consequently, the ill-typed DHOL-Boolean b := J, =por u u—mor uu Jy is translated to 
the HOL-Boolean Af : mor.f =nor—mor À f : mor.f, which is not only well-typed but 
even a theorem. 


To better understand the underlying issue we introduce the notion of spurious terms. 
The well-typed translation ¢ of a DHOL-term ż is called spurious if ¢ is ill-typed (other- 
wise it is called proper). Intuitively, we should be able to use the PERs A* to deal with 
spurious terms: to type-check t : A in DHOL, we want to use A* f t in HOL. But even 
that is tricky: 


Example 8 (Trivial PERs for Built-In Base Types). Consider the property bool* x x. Our 
translation guarantees bool* true true and bool* false false. Thus, we can use Boolean 
extensionality to prove in HOL that Vx : bool. bool* x x, making the property trivial. In 
particular, we can prove bool* b b for the spurious Boolean b from Example 7. Even 
worse, the property (I1x:A. B)* x x is trivial in this way whenever it is for B and thus 
for all n-ary bool-valued function types. 


More generally, this degeneration effect occurs for every base type that is built into both 
DHOL and HOL and that is translated to itself. bool is the simplest example of that kind, 
and the only one in the setting described here. But reasonable language extensions like 
built-in base types a for numbers, strings, etc. would suffer from the same issue. This 
is because all of these types would come with built-in induction principles that derive a 
universal property from its ground instances, at which point a* x x becomes trivial. 


Note, however, that the degeneration effect does not occur for user-declared base types. 
For example, consider a theory that declares a base type N for the natural numbers and 
an induction axiom for it. N would not be translated to itself but to a fresh HOL-type in 
whose induction axiom the quantifier V is relativized by N* x x. Consequently, N* x x is 
not trivial and can be used to reject spurious terms. 


These examples show that we cannot expect the reverse directions of the statements 
in Theorem | to hold in general. However, we can show the following property that is 
sufficient to make our translation well-behaved: 


Theorem 2 (Soundness). Assume a well-formed DHOL-theory + T Thy. 
If T Fr F: bool and T+yF, then T Hr F 


In particular, if -r s : A andT }7 t : A and T tr A* st, thenTF s =4 t. 
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Proof. The key idea is to transform a HOL-proof of F into one that is in the image of 
the translation, at which point we can read off a DHOL-proof of F. The full proof is 
given in Appendix B of the extended preprint [20]. 


Intuitively, the reverse directions of Theorem | holds once we establish that all involved 
expressions are well-typed in DHOL. Thus, we can use a HOL-ATP to prove DHOL- 
conjectures if we validate independently that the conjecture is well-typed all along. In 
the remainder of the section, we develop the necessary type-checking algorithm for 
DHOL. 


Type-Checking. Inspecting the rules of DHOL, we observe that all DHOL-judgments 
would be decidable if we had an oracle for the validity judgment I Fr F. Indeed, our 
DHOL-rules are already written in a way that essentially allows reading off a bidirec- 
tional type-checking algorithm. It only remains to split the typing judgment T Fr t : A 
into two algorithms for type-inference (which computes A from f) and type-checking 
(which takes ¢ and A and returns yes or no) and to aggregate the rules for subtyping into 
an appropriate pattern-match. 


The construction is routine, and we have implemented the resulting algorithm in our 
MMTILF logical framework [12, 19].+ The oracle for the validity judgment is provided 
by our translation and a theorem prover for HOL (see Sect. 6). It remains to show that 
whenever the algorithm calls the oracle for T Fr F, we do in fact have that T Fr F : bool 
so that Theorem 2 is applicable. Formally, we show the following: 


Theorem 3. Relative to an oracle for T -r F, consider a derivation of some DHOL- 
judgment, in which the children of each node are ordered according to the left-to-right 
order of the assumptions in the statement of the applied rule. 


If the oracle calls are made in depth-first order, then each such call satisfies Tr F : 


bool. 


Proof. We actually prove, by induction on derivations, the more general statement 
requires that each rule preserves the following preconditions: 


Judgment Precondition 
Fr T Ctx + T Thy 
Tr Atp Lr T Ctx 
Trrt:A T Fr Atp (post-condition when used as type-inference) 
TtrF T Fr F : bool 
TH-rA=Bor  FrA <: B|T Fr AtpandI Fr Btp 
4 The formalization of DHOL in MMT is available at https://g].mathhub.info/MMT/LATIN2/-/ 


blob/devel/source/logic/hol_like/dhol.mmt. The example theories given throughout this paper 
and a few example conjectures are available at https://gl.mathhub.info/MMT/LATIN2/-/blob/ 
devel/source/casestudies/2023-cade. 
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Note that rules whose conclusion is a validity judgment can be ignored because they 
are replaced by the oracle anyway. 


The most interesting case is the rule for Fras, ... Sn = at, ...t,. Here, the left-to- 
right order of assumptions is critical because I Hr sı =a, tı may be needed to show, 
e.g., I Fr s2 =Ay [nn] £2 * bool. 


6 Theorem Prover Implementation 


We have integrated our translation as a preprocessor to the HOL ATP LEO-III [23]. 
We chose this ATP because its existing preprocessor infrastructure already includes a 
powerful logic embedding tool [21,22].However, with a little more effort, other HOL 
ATPs work as well. 


Furthermore, we developed a bridge between the MMT logical framework [19] and 
LEO-III (both of which are written in the same programming language).This allows us 
to use our MMT-based type-checker for DHOL with our Leo-III-based theorem prover 
to obtain a full-fledge implementation of DHOL. Moreover, this system can immedi- 
ately use MMT’s logic-independent frontend features like IDE and module system. 


Alternatively, we can use LEO-III as a general purpose DHOL-ATP that accepts input 
in TPTP. Even though TPTP does not officially sanction DHOL as a logic, it antici- 
pates dependent function types and already provides syntax for them (although—to our 
knowledge—no ATP system has made use of it so far). Concretely, TPTP represents 
the type Ilv:A. B as !>[X:A]:B and a base type at)... tnasa @ ti ... @ tn. 
TPTP does not yet provide syntax for predicate subtypes, i.e., this approach is currently 
limited to the no-subtyping fragment of DHOL. But extending the TPTP syntax with 
predicate subtypes would be straightforward, e.g., by using A ?| p to represent the 
type Alp. 


The encoding of the conjecture given in Example 3 using the theory from Example | is 
given at https://gl.mathhub.info/MMT/LATIN2/-/blob/devel/source/casestudies/2023- 
cade/CategoryTheory/category-theory-lemmas-dhol.p (which also includes further 
example conjectures relative to the same theory). Running the logic embedding tool 
translates it into the TPTP THO problem given at https://gl.mathhub.info/MMT/ 
LATIN2/-/blob/devel/source/casestudies/2023-cade/CategoryTheory/category-theory- 
lemmas-hol.p. Unsurprisingly, LEO-III can prove this simple theorem easily. 


Practical Evaluation. In order to evaluate the practical usefulness of the translation 
we studied various example conjectures about function composition in set theory and 
category theory. We considered 5 further lemmas based on the theory in Example | 
which are written directly in TPTP and can all be proven by E, Vampire and cvc5. We 
also studied various harder lemmas about function composition and category theory. 
Those examples are written in MMT and take advantage of advanced MMT features to 
improve readability, such as definitions, user-defined notations, and implicit arguments 
that are inferred by the prover. 
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The examples can be found at https://g].mathhub.info/MMT/LATIN2/-/blob/devel/ 
source/casestudies/2023-cade. The MMT prover successfully type-checks all problems 
and translates them into TPTP problems to be solved by HOL ATPs. 


Since LEO-II can solve none of the 6 function composition examples, we also tested 
other HOL ATPs on the generated TPTP problems. Running all HOL ATP provers 
supported at https://www.tptp.org/cgi-bin/SystemOnTPTP on the function composition 
problems shows that many provers can solve 3 of the problems, Vampire can solve 4 of 
them, and 5 out of the 6 conjectures can be solved by at least one HOL ATP. 


We also studied 6 more difficult theorems about limits in category theory including the 
uniqueness, commutativity, and associativity of some limits. To better evaluate the use- 
fulness of our translation, we also formalized these lemmas in native HOL (in MMT) 
and compared the results. Naturally, the DHOL formalization is significantly more 
readable and benefits from the more expressive type system that can help spot mis- 
takes in the formalization. Running the HOL ATPs from https://www.tptp.org/cgi-bin/ 
SystemOnTPTP on the generated TPTP problems (with 60 s timeout) yields the results 
in the table below (where we omit provers that proved none of the theorems in either 
formalization). 


HOL ATP lemma 1 proven lemma 2 proven lemma 3 proven 
DHOL native HOL | DHOL native HOL DHOL native HOL 

agsyHOL yes no no no yes no 
cocATP yes no no no no no 
eves yes yes no no yes no 
cvc5-SAT yes no no no no no 

E yes yes no no no yes 
HOLyHammer | yes yes no no yes yes 
Lash yes yes no no no no 
LEO-II yes no no no no no 
Leo-II yes yes no no no no 
Leo-III-SAT | yes yes no no no no 
Satallax yes yes no no yes no 
Vampire yes yes no no no yes 
Zipperpin yes yes no no yes yes 
total 13 9 0 0 5 4 
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HOL ATP lemma 4 proven lemma 5 proven lemma 6 proven 
DHOL native HOL | DHOL native HOL | DHOL native HOL 

agsyHOL no no no no no no 
cocATP no no no no no no 
cvc5 no yes no no no no 
cvc5-SAT no no no no no no 

E no yes no yes no yes 
HOLyHammer | no yes no no no yes 
Lash no no no no no no 
LEO-II no no no no no no 
Leo-II no no no no no no 
Leo-II-SAT | no no no no no no 
Satallax no no yes no no no 
Vampire no yes no yes no yes 
Zipperpin no yes yes yes no yes 
total 0 3 2 3 0 4 


Overall more problems generated from the native HOL formalization can be solved by 
some HOL ATP (5/6 compared to 3/6 for the DHOL formalization). The HOL ATPs 
found 25 successful proofs for the native HOL problems and 20 for the DHOL prob- 
lems. This suggests that current HOL ATPs can prove native HOL problems somewhat 
better than their translated DHOL counterparts, but not much better. In 8 cases a prover 
can prove the DHOL conjecture but not the native HOL analogue, indicating that the 
two formalizations have different advantages. 


Furthermore, our translation has so far been engineered for generality and soundness/- 
completeness and not for ATP efficiency. Indeed, future work has multiple options to 
boost the ATP performance on translated DHOL, e.g., by 


— developing sufficient criteria for when simpler HOL theories can be produced 


— inserting lemmas into the translated theories that guide proof search in ATPs, e.g., 
to speed up equality reasoning 


— adding definitions to translated DHOL problems and developing better criteria 
when to expand them 


Thus, we consider the test results to be very promising. In particular, the translation 
could serve as a useful basis for type-checkers and hammer tools for DHOL ITPs. 


7 Conclusion and Future Work 


We have combined two features of standard languages, higher-order logic HOL and 
dependent type theory DTT, thereby obtaining the new dependently-typed higher-order 
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logic DHOL. Contrary to HOL, DHOL allows for dependent function types. Contrary 
to DTT, DHOL retains the simplicity of classical Booleans and standard equality. 


On the downside, we have to accept that DHOL, unlike both HOL and DTT, has an 
undecidable type system. Further work will show how big this disadvantage weighs in 
practical theorem proving applications. But we anticipate that the drawback is manage- 
able, especially if, as in our case, an implementation of DHOL is coupled tightly with 
a strong ATP system. We accomplish this with a sound and complete translation from 
DHOL into HOL that enables using existing HOL ATPs to discharge the proof obliga- 
tions that come up during type-checking. We have implemented our novel translation as 
a TPTP-to-TPTP preprocessor for HOL ATP systems and outlined the implementation 
of a type-checker and hammer tool for DHOL based on the resulting prover. 


Moreover, once this design is in place, it opens up the possibility to add certain type 
constructors to DHOL that are often requested by users but difficult to provide for sys- 
tem developers because they automatically make typing undecidable. We have shown 
an extension of DHOL with predicate subtypes as an example. Quotients, partial func- 
tions, or fixed-length lists are other examples that can be supported in future work. 


We expect our translation remains sound and complete if DHOL is extended with 
other features underlying common HOL systems such as built-in types for numbers, 
the axiom of infinity, or the subtype definition principle. How to extend DHOL with 
a choice operator remains a question for future work — if solved, this would allow 
extending existing HOL ITPs to DHOL. 


Acknowledgment. Chad Brown and Alexander Steen provided valuable feedback on earlier ver- 
sions of this paper. 
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Abstract. This paper describes anti-unification algorithms for comput- 
ing least general generalizations of two expressions in a functional pro- 
gramming language with recursive let. First, by exploring a semantic app- 
roach to the problem, we argue for an improvement of the technique used 
in previous papers which avoids infinite chains of properly descending 
generalizations. Second, we present a (non-deterministic) nominal gen- 
eral anti-unification algorithm applicable to general expressions, which is 
complete, terminating and requires polynomial time. Third, we propose a 
specialized anti-unification algorithm applicable to two or more garbage- 
free ground expressions that produces a single least general generaliza- 
tion in polynomial time, and which can also exploit further semantically 
correct equivalences. Our results have potential applications in finding 
clones in functional programs. 


Keywords: Anti-Unification + Nominal Techniques - Generalization - 
Functional Programming - Recursive Let 


1 Introduction 


Anti-unification problems (a.k.a. generalization problems) consist in finding a 
least general generalization (lgg) of two or more given expressions. This prob- 
lem has interesting applications in computer science and software engineering, 
such as, symbolic mathematical computing [21], proof generalization [10], clone 
detection [8], among others; an overview is [6]. Early proposals to apply gener- 
alization for analyzing and improving programs by syntactic manipulations was 
given by Plotkin [12] and Reynolds [13]. 

We are interested in the anti-unification problem for languages with binders, 
such as the lambda-calculus, the pi-calculus, or the more general nominal lan- 
guage [11]. For instance, Aw.Z is a generalization of the lambda-expressions 
Aa.app(a,a), Aa.Ab.a, and Ac.c. In fact, from Ax.Z one can retrieve any of the 
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three expressions in the set by considering the appropriate instance of Z (where 
capturing is permitted), modulo renaming of bound variables: Z +> app(z, x), 
Z + Xb.x and Z + zx, respectively. 

In the context of languages with recursive let (letrec), techniques for solv- 
ing anti-unification problems would allow, for instance, to identify the program 
scheme letr b.(Av.N);a.(Az.M) in b(y) as a generalization of the program |1] 


letr even.(Aq. if-else (x = 0) (true) (odd(x — 1))); 
odd.(Ax.if-else(x = 0)(false)(even(x — 1))) 


in (even y) 


or even identify both fragments of programs as possible clones [8]. 

In general, and as illustrated above, reasoning and automated deduction in 
higher order languages often require — as a very basic operation — to iden- 
tify expressions up to a-equivalence. This means expressions are identified if 
they are syntactically equal up to a renaming of bound variables (which rep- 
resent the binding structure). In addition, one has to have in mind that the 
letrec construct also satisfies laws like commutativity and associativity of its 
environment (e.g. we could permute the environment 0.(Av.N);a.(Arz.M) as 
a.(Ar.M);b.(Axz.N) above), which will be working in combination with bind- 
ing primitives (i.e., also rename the bindings within the environment obtaining, 
e.g., c.(Av.M"); d.(Ax.N’)), and they also may occur nested. 

Checking expressions for a-equivalence is an operation that is often per- 
formed on large and complex expressions. Ad-hoc algorithms for checking a- 
equivalence of such expressions are worst-case exponential due to searching for 
all possible permutations and renamings. An approach to handle a-equivalence 
in deduction systems is to use nominal techniques [5,11], where the focus is to 
ease formula specification and deduction rather than speeding up a-equivalence 
checking. In general, checking a-equivalence with the language extended with 
letrec using nominal techniques is a GI-hard problem [18]. Here, we follow the 
nominal approach to handle binding of names and their renaming. 

In [17] we have proposed a semantic approach to anti-unification based on 
nominal techniques which uses atom-variables, and significantly improves an 
existing approach [4] to anti-unification for languages with binders, since it pro- 
vides a finitary set of least general generalizations. In this work we propose a 
simplification of this semantic approach to a nominal language extended by the 
letrec construct, which we call NLL x. 


Our Results. We provide a nominal anti-unification algorithm (ANTIUNIFLETR) 
for NLLx which preserves the good properties of our semantic approach: it is ter- 
minating, sound, computes an exponential number of generalizations (Theorem 
1) and weakly complete (Theorem 2). Completeness is achieved after further 
specialization of the computed generalization (Theorem 3). 

The observation that garbage might be present in letrec expressions (for 
example, useless bindings in environments), and that they can be avoided by a 
semantically correct garbage collection algorithm, allows to apply the results and 
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methods in [18], which shows that a-equivalence and further algorithms could 
be considerably improved for garbage-free expressions. This leads to the design 
of ANTIUNIFNOGARBAGE, an anti-unification algorithm for ground garbage-free 
expressions, that is terminating, runs in polynomial time and produces one least 
general generalization, i.e. it is unitary (Theorem 4). 


2 Preliminaries 


We consider a countable infinite set of atoms A of (concrete) symbols a,b which 
we usually denote in a meta-fashion; so we can use symbols a, b also with indices 
(the variables in lambda-calculus). We also consider a set F of function symbols 
with arity ar(-), and a countably infinite set of expression-variables Var ranged 
over by X,Y. We will use mappings on atoms from A: a swapping (a b) is a 
bijective function that maps atom a to atom b, atom b to a, and is the identity 
on other atoms. We will also use finite permutations 7 on atoms from A, which 
consists of a composition of swappings: in fact, every finite permutation 7 can 
be represented by a composition of at most (|dom(m)| — 1) swappings, where 
dom(m) = {a € A | x(a) # a}. The identity permutation is denoted Id. Com- 
position 71 0 72 and the inverse 7~! can be immediately computed, where the 
complexity is polynomial in the size of dom(z). 


Ground Expressions. The syntax of expressions é of the (ground) language NLL 
with recursive let is: 


E ::= a | àa. | (f E& «.. Ear(f)) | (Letr ay.e1}....4G,.2, in g) 


Ground expressions are either atoms, abstractions of an atom in an expres- 
sion, function application, or a letrec expression. We assume that binding atoms 
@1,.--,@n ina letrec-expression (letr @1.€1;...; @n-En in é) are pairwise distinct. 
Sequences of bindings a1.é@,;...;@,.€, may be abbreviated as env (environ- 
ment). The scope of atom a in Xa.é is standard: a has scope ë. The letr-construct 
has a special scoping rule: in (letr a1.@1;...;@n.€n in @), every atom a; that is 
free in some €; or ē is bound by the environment @1.€1;...;@n-@n. This defines 
in NLL the notion of free atoms FA(é@), bound atoms BA(é) in expression ë, and 
all atoms AT(é) that occur in ë. For an environment env = {@1.@),...,@n-En}, 
we define the set of letrec-atoms as LA(env) = {a1,..., an}. We say a is fresh 
for e iff a g FA(é), denoted as a#é. 


Remark 1. The base language NLL is a lambda calculus extended with function 
constant and a recursive let constructor letr, and can also be interpreted as an 
untyped fragment of Haskell [7]. The function application operator in functional 
languages (implicit in some languages) can be encoded by a binary function app, 
and the case-construct in its plain form can be encoded as an application. 


Example 1. The letrec-expression (letr a.cons € b; b.cons Ez a in a) represents 
an infinite list (cons €, (cons Ez (cons E (cons Ez ...)))), where @),é@2 are 
expressions and cons is the usual list constructor taken as a function symbol. 
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Syntactic a-equivalence on NLL is defined, following [16], as an 
extension of usual a-equivalence, where in addition the expressions 
(letr @).@);...;@pn-€, in £) and (letr a}.é;...;a/,.é, in Z’) are a-equivalent iff 
the expressions can be made equal by correctly renaming them, possibly reorder- 
ing the environment. 


Definition 1. The a-equivalence ~a on E€ NLL is defined as follows: 


- a ~a a for atoms a. 

— if & ~a E; for all i, then (f @1..-€n) ~a (f &|...&,) for n-ary f E F. 
- few, €, then `a. ~a àa.. 

If a#e’ and € ~a (a b)-€', then X\a.€ ~a Ab.2'. 


= (letr a1.ĉ1;...;ün-En in E) ~a (letr ap(1)-Ep(1); - - -i @p(n)-Ep(n) in E) 
for any permutation p of {1,...,n}. 
- The following holds for a permutation n on atoms {a;,...,an}U{a1,..., ah}: 
* t SA A = / = / ol s I 
Vi. n(a;) =a, TE, ~ali T:E gE a;#(letr a}.é);...;a,.é, in é) 
(letr a1.€1;...3@n-€n in E) ~a (letr o}.@;...5a),.€, in Z’) 
where, for i = 1,...,n: a;’s are pairwise distinct, and a‘’s are pairwise 
distinct. 


Permutations operate on NLL-expressions by recursing on their structure. For 
example, m-(letr a1.€1;...}@n-€, in €) = (letr 7-a1.7-€1;...; 7-Gy.7-Ey in TE). 


General Expressions. The syntax of the nominal higher-order language NLL x 
with letrec and variables is: 


e,s,t :=a| TX | Aa.e| (f e +. ear(f)) | (Letr az.e1;...;an-en in e) 
T := | (a b)-7 


General expressions extend NLL with suspensions, i.e., expressions of the form 
m- X, which denotes a variable X (also called a generalization variable) in 
which a permutation is suspended: 7 is waiting for some instantiation of X 
before its action. The basic properties and functions of NLL such as FA(e), 
BA(e), scope, fresh, etc., extend to NLLy as expected. In particular, AT (e) 
is extended to suspensions as AT(mr - X) = {a | a € dom(z)}. The suspen- 
sion Id-X is written simply as X. We define Head(s) either as the top func- 
tion symbol in {a, f,A, letr} or Head(a - X) as X. More generally, for a non- 
variable expression e, the expression m-e means an operation, which is per- 
formed by shifting m into the expression, using the additional simplification 
Ti (Tze) — (m1 © 72)-e, where after the shift, m only remains in suspensions. 
For instance, (a c)» (letr a.(Ab.X) in f(a)) denotes a renaming of a to c and 
vice-versa, which is equal to (letr c.(Ab.(a c)- X) in f(c)). 

An NLLx-freshness constraint is an expression of the form a#e, expressing 
that a is not free in (or is fresh for) e, where e is an NLLx-expression. A conjunc- 
tion (or set) of freshness constraints is called freshness context which is written 
using the notation V, A. Every NLLx-freshness context can be transformed into 
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{a#b}UV fesi {a#(r - X)}UV {a#(f sı... Sn)}UV 
V {n (a #X}UV {a#s1,...,a#sn}UV 
{a#(Aa.s)}UV {a#a}UV {a#(Ab.s) }UV r 

SU ADLE ~ jfa#b 
V L {a#s}UV 


{a#(letr a1.51;...,@n.5n in r)}UV | 
= if a € {a1,...,an} 


{a#(letr a1.51;...,@n.8n in r)}UV 
{a#s1,...a#sn,a#r}UV 


if a ¢ {a1,...,an} 


Fig. 1. Simplification of freshness constraints in NLLx 


a simpler one (flattened form) using the rules in Fig. 1 exhaustively until consist- 
ing only of constraints of the form a#X or L (fail), which are called atomic. An 
NLL x-freshness context V is consistent if its flattened form does not contain L. 
The definition of a-equivalence extends to NLLx as expected. In the following, 
[s]a denotes the equivalence class of the expression s induced by the equivalence 
relation ~a. 


Lemma 1. Simplification using rules of Fig. 1 constitutes a polynomial decision 
algorithm for satisfiability of V: If L is in the result, then unsatisfiable; other- 
wise, satisfiable. 


An NLLx-substitution p is a finite mapping from generalization vari- 
ables to NLLx-expressions. Substitutions act on expressions homomorphically 
and this action extends to freshness constraints and contexts as follows: 
(a#X)p iff a#Xp and Vp = {a#ep | a#e € V}. We will denote the domain 
of substitutions by dom(-). A substitution is ground if it maps (generalization) 
variables to NLL-expressions. For a ground substitution p: Vp is called valid iff 
Vp is consistent. 


Permutations and Cycles. A cycle T in A is a permutation represented by 


a sequence of different atoms a1,d2,...,@n, such that 7(a;) = aj41 for i = 
1,...,2 — 1 and 7(an) = a,. As standard, such cycle will be denoted as 
T = (a1 ag ... an). Every permutation 7 has a representation 772...T, (which 


abbreviates Tı 0 T2 0... O Tn) where 7; are disjoint (primitive) cycles. 

The disjoint cycles can be permuted. For instance, the permutation 
(a b)(b d)(c e) has the cycle presentation (a b d)(c e) which is the same as 
(c e)(a bd). 


2.1 Data-Structures of Anti-unification Algorithms 


Anti-unification algorithms will produce as a result expressions that are 
restricted by a freshness context. These are called expressions-in-context and 
denoted as (V, s), where V is a freshness context and s is an NLLy-expression. 
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The semantics of expressions-in-context follow the idea that syntactically 
used names of atoms in expressions are fixed, and atoms occurring in V, but not 
in s are viewed as existentially quantified: these are treated as arbitrary names 
of atoms. 


Definition 2. An expression-in-context is a pair (V,e), where e is an expres- 
sion and V is a (consistent) freshness context. The semantics of (V,e) is the set 
of ground instances of e that satisfy V, i.e., 


[(V,e)] = {[r]a | 3 : Va € AT(e). ap =a and |r]a = [efla and Vp valid} 


where p is a mapping from VarU A to ground expressions such that pl, is a 
bijection on atoms. 


The existential quantification on valid instances of expressions gives addi- 
tional power to the semantics of expressions-in-context: by considering a as exis- 
tentially quantified, we obtain that [({a#X }, X)] is the same as [(0, X)]. 


Example 2. Consider the expression-in-context ({a#X}, f(X)). We will argue 
that [({a#X}, f(X))] = [(0, f(X))]. First, notice that a does not occur syn- 
tactically in f(X) and therefore we can take / mapping a to an arbitrary atom 
that does not break validity of V. In fact: 


— It is obvious that [({a#X }, f(X))] € [(0, f(X))], since the left one has more 
restriction on its elements than the right one. 

- (0, f(X))] € [a#X}, f(X))]: Let ô be a bijection on atoms that is the 
identity on the atoms occurring in f(X) (there is none). Then, we select 
ap Z [f(X)]a which trivially implies that af#X f holds. 


Our semantics for ({a#X},X) differs from the one in Baumgartner et al. [3] 
where [({a#X},X)]pB is the set of all ground instances of X, where a is 
not permitted to occur free. This will induce the negative effect of prop- 
erly infinite descending chains! of expressions-in-context such as ... <p 
({a4tX, bX}, f(X)) <B (a#X}, f(X)) <B (0, f(X)), which is eliminated in 
our approach since in all these expressions-in-context have the same semantics. 


Next we define an order relation on expressions-in-context which establishes 
when one expression-in-context is more general or more specific than another. 


Definition 3 (Ordering, Generalization). 


— An expression-in-context (A,r) is more specific (or less general) than an 
expression-in-context (V,s), denoted (V,s) < (A,r), if [(A,r)] © [(V,s)]. 
The strict part of < is denoted <. This defines equivalence of two expressions- 
in-context via their semantics: (V,s) = (V’,t) if (V, 9] =[(V’, 6]. 

- An expression-in-contert (A,r) is a generalization of (V,s) and (V’,t), if 
(A,r) < (V,s) and (A,r) < (V’,t). 


' []s and ~g denote the semantics and order relation in [4], resp. 
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— A generalization (A’,r’) of (V,s) and (V',t) is the most specific (the least 
general) one, if for all generalizations (A,r) of (V,s) and (V',t), we have 
(A,r) < (A’,r’). 


For instance, the expression-in-context (Ø, Ae.app(e, X)) is a generalization of 
(0, Aa.app(a,c)) and (0, Ab.app(b, Z)), for a new atom e. It is easy to verify that 
(0, Ae.app(e, X)) < (0, AXa.app(a, c)) and (0, Ae.app(e, X)) < (0, Ab.app(b, Z)). 


3 The Anti-unification Problem for NLLx 


We are interested in the anti-unification problem for NLLx: 
Given two expressions-in-context (V,s) and (V, t), 
Find a least general generalization, i.e., another expression-in-context (A,r) that 
satisfies (A,r) < (V,s) and (A,r) < (V,t). 

The challenge in treating letrec-expressions in anti-unification algorithms is, 
on the one hand, its unusual scoping and; on the other hand, the multiple pos- 
sibilities to formulate the same problem in several syntactically different ways. 


Remark 2 [Permutations in the generalization of suspensions]. Generalization 
of suspensions, say (@,71-Z) and (@,72-Z), need some preparations based on 
properties of permutations: first, we decompose 7, and 7 into their cycle pre- 
sentation, say 7 = 1... Hn and T2 = H) ... Hin; second, we work on generalizing 
(0, pı . - - Hn: Z) and (0, pu, ...u,,-Z) as follows: let 73 be a permutation obtained 
from the set of common cycles of mı and 72, say mı = 737, and m2 = 737%. 
Then, 73-X is a generalization for (0,7, - Z) and (Ø, 72: Z). In the following we 
will denote the common cycles of permutations 7, and m as mı N m2. This will 
be addressed in details with the specific rule for suspensions in Fig. 2. 


3.1 The Algorithm ANTIUNIFLETR and Its Rules 


We first define the nominal generalization algorithm ANTIUNIFLETR that (non- 
deterministically) computes a single generalization of the input expressions, 
where the generalization can also be nonlinear in the generalization variables 
due to merging. We will argue that the algorithm is sound and weakly complete, 
and one run can be performed in polynomial time. 

The data structure of the algorithm ANTIUNIFLETR is (I, M, V, L) where: 


— T is a set of generalization triples of the form X : s £ t, where X is a fresh 
(generalization-) variable, and s,¢ are NLLy-expressions; 

— M isa set of solved generalization triples; 

— V is a set of freshness constraints, without freshness constraints for the fresh 
generalization variable for the input generalization triple; 

— L is a substitution represented as a set of bindings; the empty set is []. 
The result of applying the substitution L on the generalization variable X is 
denoted as X o L. 
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We call such a tuple a state. The rules of the algorithm ANTIUNIFLETR, 
given in Fig. 2, operate on states and WU denotes disjoint union. Given two NLL 
expressions s and t, and a freshness context A (possibly empty), to compute 
generalizations for (A,s) and (A,t), we start with ({X : s £ t};0;A;][]), the 
initial state (sometimes abbreviated to (A,{X : s £ t})), where X is a fresh 
generalization variable, and we apply the rules from Fig.2 and Fig.4 until no 
more rule applications are possible and we reach the final state which has the 
form (@,M,V,L), where M must be completely merged. We will denote the 
computation from initial to a final state: (T; 0; A; {]) = * (0; M; V, L). 

The output is an expression-in-context obtained from the generated substi- 
tution L and the final freshness constraint V, i.e. the output is (V, X o L), also 
called the result computed by the ANTIUNIFLETR algorithm. We say it is com- 
plete if every least general generalization (lgg) is found and it is weakly complete 
if every lgg is found up to some set of freshness constraints. 


(Dec): DECOMPOSITION 
{X:f(s1,--.,8n) = f(i,..-,tn) JUL, M, V, L 
X; are fresh variables n =0 is permitted 
PU {Xi:s1 Ê th,...,Xnisn & tn}, M, V, LU {X > f(X,...,Xn)} 


(Absaa): ABSTRACTION 
{X:\a.s £ Aa.t}ul, M, V, L Yis a fresh variable 


TrU{Y:s £t}, M, V, LU {X => àa. Y} 


(Absab): ABSTRACTION 
{X:\a.s £ Ab.t}Ul,M,V,L Y isa fresh variable cis a fresh atom 
T U{Y:(c a)-s £ (c b)-t}, M, V U {c# Aas, cHAb.t}}, LU {X = Ac.Y} 


(SusYY): SUSPENSIONY Y (Mer): MERGING 
T = T1 N T2, T1 = 1-1, T2 = 1-1 T,{X:sı Ê t1, Y:s2 Ê t2}UM, V, L 
{X:n Y £ m2 Y }UT, M, V, L Eaevm({(s1,t1) < (s2,t2)}) = 7 


{Zn Y £r, Y}UT,M,V,LU{X > rZ} T,MU{X:sı £ t1}, V, LU {Y > 2X} 


(Solve) 
(SolveYY) {X:s  t}uT,M,V,L 
{Xm -Y m- Y}UT,M,V,L Head(s) # Head(t) or s,t letrec-expressions 
Ti É 72,71 N T2 =O with a different number of bindings 
T,MU{X:m Y m- Y}, V,L T,MU{X:s £t},V,L 


Fig. 2. Rules of the algorithm ANTIUNIFLETR 


Rules in Fig. 2 are similar to the ones in [3] without the parameter for the 
set of atoms occurring in the initial state and throughout the computation, and 
deal with abstractions, function application, and suspensions. The subalgorithm 
EavM, defined by the rules in Fig. 3, computes a matching permutation, say 7, of 
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WU{f(si,...,8n) 2 f(si,---;8n)} Wu{Aa.s < Aa.t} 
YU {s1 Xs8},...,8n Xs} WU{s xt} 
Wu{Aa.s < Ab.t} b#àa.s Wu{Aa.s < rb.t} a#tAb.t 
WU {(a b)-s <t} WU {(s < (a b)-t} 


1. Exhaustively apply the rules above. If after the application W contains pairs not 
of the form a < b, then Fail. 

2. Let W = {ai < bi,...,an X bn}. If the mapping {g9 : a > bi |i = 
1,...,n, where a; < b; E€ W} is not injective, then Fail. 

3. Return the bijective mapping 7 generated by a; > b;, i =1,...,n. 


Fig. 3. The permutation matching (sub-)algorithm EQvm 


(Letraa): LETREC WITH ORDERED ATOMS 
{X:letr a1.81;...;@n.8n in s £ letr @1.t1;...3An.tn in t}Ul, M,V,L 
PU{X1:s12t1,...,Xnisn2tn, Y:s£t}, M, V, LU{Xletr a1.X1,...,an.Xn in Y} 


(Letperm): LETREC WITH PERMUTED BINDINGS IN ENVIRONMENTS 
{X:letr @1.81;...;@n.Sn in s Ê letr by.t1;...3bn.ty in thul, M,V,L 
p is a permutation on {1,...,n} 


{X : letr @1.81;...5Qn-Sn in S £ letr bp(1)-tp(a); paces bon) -tp(n) in t} UL M,V,L 


(Letrab): LETREC WITH ATOMS IN ENV SWAPPED WITH NEW NAMES 
{X:letr a1.81,...,@n.Sn in $ 4 letr by.t1,...,0n-tn in t}uD, M,V,L 
V’ = {ci#(letr ai.s1,...,Qn-Sn in s)} U {ci#(letr b1.t1;...;bn-tn in t)} 
Tı = (a1 C1)... (An Cn) T2 = (bı c1)... (bn Cn) c are fresh and different atoms 


{X:m1-(letr a1.81;...;@n-Sn in s) £ m- (letr bi.t1;...;bn.tn in tHUI, M, VUV’, L 


Fig. 4. Rules for letrec of the algorithm ANTIUNIFLETR 


two expressions-in-context (say s < t in W with context V), where EQVBIEX(IZ) 
checks whether the set of swappings is injective and then adds a minimal set of 
mappings such that the result is a bijection, i.e. a permutation (on atoms). Rules 
in Fig. 4 are new and will be described in detail: 


Rule (Letraa) acts as a decomposition rule with the letr construct and can 
only be applied if the bindings in the environment are the same, respecting 
the given order. 

Rule (Letrperm) is branching and exhaustively tries to generalize the expres- 
sions by considering all permutations of the letr environment. 

Rule (Letrab) deals with renaming of bound names; it consistently swaps the 
binding atoms of the letr environment with fresh names and propagates the 
obtained permutation throughout both expressions. 
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The latter rule exploits the following idea: if Aa.s and Ab.t are a-equivalent, then 
one can rename a and b with the same fresh name c and propagate the renaming 
within s and ¢ and still obtain a-equivalent expressions. 


Example 3. A generalization for the expressions-in-context (@,letr a.a;b.c 
in f(a,b)) and (Q,letr b.a;c.c in f(a,b)) is computed as follows: 


1. We cannot apply rule (Letraa) since the binding atoms in the environment 
are not corresponding to each other. We may rearrange the bindings using 
(Letperm). Then we apply rule Letrab for renaming: we choose d,e as fresh 
atoms and use the renaming (a d)(b e) and (c d)(b e), which leads to the check 
V' = {d,e#(letr a.a;b.c in f(a, b))} U {d, e#(letr c.c;b.a in f(a,b))} = 0 
which holds and evaluates to Ø, since the terms are ground. After an applica- 
tion (Letraa), which decomposes the letrec environments: 


({X : letr a.a;b.c in f(a,b) £ letr b.a;c.c in f(a,b)},0,9, []) 
{X:letr a.a;b.c in f(a,b) Ê letr c.c;b.a in f(a,b)}, 9,9, []) 
({X:letr d.d;e.c in f(d,e) £ letr d.d;e.a; in f(a,e)},9,9, []) 
({Xi:d = d, X2:c ê a, Y: f (d,e) = f(a,e)}, 0,0, {X > letr d.X1;e.X2 in Y}) 


2. After three applications of (Dec), one (Solve) and one (Mer) we obtain 
(0, {X2 : c £ a},0, {X => letr d.d;e.X_ in f((c d) - X2,e)}). The output 
generalization is (0, letr d.d;e.X2 in f((c d) + X2,e)). 


Another Solution: from (X : letr a.a;b.c in f(a,b)  letr b.a;c.c in f(a, b)) 
we could have immediately applied the rule (Letrab) using mı = (a d)(b e) for 
the left and 72 = (b d)(c e) for the right expression. This finally leads to a 
generalization of the form letr d.X1,e.X2 in f(X3, X4) which is “weaker” (too 
general) than the one above. 

Note that the environments of one of the expressions to be generalized con- 
tains garbage: the binding c.c is not used in f(a, b). 


Theorem 1. The algorithm ANTIUNIFLETR is terminating and sound. A single 
run requires polynomial time. The overall computation requires exponential time 
and may compute an exponential number of generalizations. 


Proof. Soundness and termination can be easily checked by inspection of the 
rules of Figs.2, 4 and 3. The number of nondeterministic alternatives is expo- 
nential in the worst case, and it is induced by the rule (Letperm). A single run 
(one branch) can be performed in polynomial time. 


Notice that except for rule (Letrab), all the rules in ANTIUNIFLETR algorithm 
preserve the context V. This differs from the approach taken in [3] which might 
add new freshness constraints with a rule similar to our rule (SolveYY), based on 
a set A of all atoms appearing throughout the computation of a generalization. 
We show in the next example that this choice of initially preserving the freshness 
context leads to a weak completeness result, but completeness is regained with 
a specialization algorithm that will be presented next. 
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Example 4 (Weak Completeness). The expressions-in-context (Ú, f(c1,a)) and 
(Ø, f(c2,a)) have the generalization (9, f(X1,@)) computed by the rules of Fig. 2. 
However, this is not the lgg since ({a#X1}, f(X1,a@)) is a more specific general- 
ization. In fact, f(a,a) € [(0, f(X1,a)], but f(a,a) ¢ [{a#X1}, f(X1,a)]. 


Theorem 2 (Weak Completeness). Given NLLx expressions e and e’, and a 
freshness context A. If (V',r) is a generalization of (A,e) and (A, e’), then there 
exists a V” and a derivation ({X : e = e'},0,A, []) =—>* (0,M,V,c) such that 
(VUV", Xo) is a generalization of (A,e) and (A, e’) and (VUV", Xa) < (V’,r). 


Proof. The proof is by induction on the structure of r. 


Example 5 (Cont. Example 4). We remark another behaviour that can be seen 
from the execution of ANTIUNIFLETR: ({X:f(c1,a) = f(c2,a)}, 0,0, []) reduces 
to (0,{Xi:c1 = c2},0,{X + f(X1,a)}). Notice that (i) f(a,a) is clearly not 
an element of [(0, f(c1,@))] nor [(O, f(c2,a))]; Gii) the information that cı and 
C2 were free names in the input problem was “forgotten” by the generalization 
f(X1,a), but it can be retrieved from the solved triple in the final state. (iii) 
a#c, and a#cə hold trivially. 


3.2 From Weak Completeness to Completeness 


Given a result (V,s) of a run of the algorithm ANTIUNIFLETR, the result is 
in general only weakly complete, since the expressivity of the language may 
permit a better generalization. The true most specific generalization may have 
additional freshness constraints, as it was shown in Example 4. The problem of 
specializing the generalizer output by ANTIUNIFLETR is subtle: a different but 
related behaviour can be seen with the next example. 


Example 6. Consider the expressions-in-context (0, f(g(c1,@),a)) and (0, f (c2, 
a)) as input for ANTIUNIFLETR. The output generalization is (0, f(X1,@)), and 
this is the lgg. In fact, a run of the algorithm would terminate with the final 
state (0, {X1:9(c1, a) = co}, 0,{X > f(X1,a)}). 

We can use the information in the solved part of the final state to build the 
substitutions 0, = {X1 > g(ci,a)} and o2 = {X1 > co} that instantiate the 
generalization f(X 1, a) back to the input terms. Notice that a##X 0 is equal to 
a#g(c1, a) and does not hold. Thus, we cannot add {a#.X,} as a constraint to the 
generalization, since ({a#X,}, f(X1,a)) cannot be instantiated to f(g(c1, a), a). 


Let y = (0; M; V; L) be a final state. We define AT(7) as the set of unbound 
atoms that occur in M, V or codom(L). We say that a generalization variable X 
occurs in y when it occurs in V, or as a subterm in M, or in codom(L). 
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Algorithm 1 ANTIUNIFLETR- Phase 2 

Input: (A, s) and (A, t) 

{X : s 2 t};0; 4; = y= (0; M; V; L) 

Let (V, r) = (V, X o L) be the resulting generalization. 

Let X be a generalization variable occurring in r. > Repeat for each X 

if a € AT(r)\BA(r) and a ¢ RelAtoms,(X) then > Repeat for each a € AT (t) 
V := V U {a#x} 

end if 


Definition 4 (Relevant Atoms). Let y = (Ø; M; V; L) be a final state in a 
run of ANTIUNIFLETR. Let X be a generalization variable occurring in y. The 
set of relevant atoms for X, denoted RelAtoms, (X), is defined recursively: 


— If there is no solved triple for X in M. Then, the relevant atoms are 
RelAtoms.,(X) = AT; (y) {a | a#X €E V}, i.e., all atoms that are not bound 
and that occur syntactically in the state, but not the atoms that were excluded 
due to the freshness constraints in V. 

- If there is a solved triple X : s = t € M. Then, RelAtoms,(X) = 
RelAtoms,(s) U RelAtoms.,(t). The other cases are defined recursively in the 
structure of the expression: 

RelAtoms,(a) = a, RelAtoms,y(f s1...5,) = U; RelAtoms,(s;); 

RelAtoms,(m-s) = 1-RelAtoms,(s); 

RelAtoms,(Aa.s) = RelAtoms.,(s)\{a}; and 

RelAtoms.,(letr a@1.81;...;@n-S8n in r) = RelAtoms,(si,...,5n,7)\ 

{a1,..., Qn}. 


For example, if we take M = {X:f(a,b) £ g((a c)-Y), Y:f(c,d) = g(e)} and 
V = {a#Y}, then the set of relevant atoms for Y is {c,d,e}, and for X it is 
{a, b} U(a c){c, d,e} = {a,b, d,e}, where it is noteworthy that atom c is missing. 

We formulate a postprocessing algorithm (Algorithm 1) for ANTIUNIFLETR 
which is able to compute least general generalizations. 


Theorem 3. Adding (Algorithm 1) makes ANTIUNIFLETR. complete. 


Note, however, that due to the non-determinism, it may be possible that one 
of the runs generates a generalization that is strictly less specific than the result 
in another run, see Example 3. 


Example 7. This example shows the result of generalizing more complex expres- 
sions. Consider the generalization problem, and the sequence of generalization 
steps, where the last step abbreviates several steps. 


({X1 : Aa. f(a, a,c) & db. f(b, d,c)}, 9, [ J) 
({X1 : Ae. f (e, e,c) £ Ae. f (e, d, c)}, 0, [ J) 
({Xo : fle,e,c) £ f(e,d,c)},0,{X1 + Ae. Xo}) 
(0, {X3 : e & d}, {X1 = Ne.Xo, X2 > f(e, X3,c)}) 
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Now the resulting lgg can be computed by adding only one freshness constraint: 
({g#X3}, Ae. fle, X3,c)). This holds, since d € RelAtoms,(X3), and hence does 
not occur in the freshness context. Notice that c#X3 is added as a freshness con- 
straint since c occurs in the generalization expression, but c ¢ Rel Atoms,(X3). 


4 Generalization Algorithm Under Semantic Equalities 


We use semantic equivalences to specialize and extend our anti-unification algo- 
rithm to ground expressions. In particular, we exploit the fact that removal of 
garbage is semantically correct: it does not alter the meaning of the program. 
First, we develop a standardization algorithm for garbage-free expressions that 
helps in comparing the letrec-expressions and computing generalizations in poly- 
nomial time. Second, we propose a variation of our anti-unification algorithm 
called ANTIUNIFNOGARBAGE. 

NLL-expressions may contain irrelevant bindings in the letrec environment: 
for instance, in (letr a.Nil;b.b in f(a,a)), the binding b.b is useless for the 
expression, and will be considered as garbage. The garbage bindings do not con- 
tribute to the meaning of the functional expressions. It is shown in [18], that 
a-equivalence of garbage-free letrec-expressions can be checked in polynomial 
time, and that, in general, this problem is group-isomorphism-complete [2, 20]. 


Definition 5. Let € be an NLL-expression. We say that € contains garbage 


iff there is a subexpression (letr a1.€1,...,Qn-En in Z’) in E€ such that the 
environment a1.€1,.-.-,@n-En can be split into two nonempty sub-environments 
Qj, -Eis +++, Qi, Cx, ANd Aj, .2j,,-.-,4j,,-€j,,, and the binding atoms ai,,h = 
i1,...,t% do not occur free in letr aj,.€;,,...,@5,-€;, in €. We say that è is 


garbage-free (or garbage-collected) iff it does not contain garbage. 


Making an expression garbage-free may require an iterated removal of 
garbage, using the garbage removal rewriting rules below: 


(gr1) letr ai.e1;.. $Gn-€n;b1.€1;...;bm-€m in em+1 — 
F T. n / . 1 
letr b1.€1;...;bm;€m in €m4i, if (J FA(e;) N {a1,...,an} = 0 


(gr2) letr a1.€1;...;@n-€n in e — e, if FA(e)N {a1,...,an} = 0 


We illustrate our ideas for the generalization of garbage-free expressions. Note 
that the used equality of expressions makes a notable difference for the results 
as well as for the algorithmic steps. 


Example 8. Let 5 = let c.a in f(g(c)) and t = let d.b in f(h(d)) two 
garbage-free ground expressions. A generalization of s and t w.r.t. ~a is 
s’ = let c.X, in f(X2), which is also an lgg. If we would allow more equal- 
ities on the expressions, like ~gc as a part of the equality or even an equality 
~a gc, letep that allows also copying let-bindings, then 5 would be equivalent to 
f(g(a)) and t equivalent to f(h(b), which have f(X) as a generalization. The 
generalisation algorithm, however, would be much more complex. 
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(Letrnm): Letrecnm (for m < n) 
{X:letr Gn—m+1-8n—m41}-+-}@n-Sn in s £ letr a1.t1;...5@n-tn in thul, M,V,L 


{X:letr a1.41;...3 Qn—m-An—m} An—m41-Sn—m41) +++} An-Sn in S 
A ` 
= letr ai.ti,...,@ntn int} UT, M,V,L 


Fig. 5. Different lengths of letrec-environments in ANTIUNIFLETR 


The next step is to standardize the sequence of bindings in garbage-collected 
expressions, which greatly supports further operations. 


Standardization Algorithm. Consider let a1.ĉ1;...;an-En in € be a garbage-free 
NLL-expression. Then, rearrange the bindings as follows: 


1. Let aj be the atom from {a1,...,an} that has the earliest occurrence as a 
free atom in the expression @, in its printed string. Then select aj.&;j as the 
leftmost binding in the fresh environment, i.e. ro = €; rı = letr a;.é; in é. 

2. Iterate this to compute rz, from rz_; = letr envz_, in € by selecting among 
the remaining binding atoms a, € {a1,...,@n}\{a;} again the one which first 
occurs free in the printed string of rk—1, and then add aj.é; as the leftmost 
binding in the letr-environment obtaining rą = letr aj .Ej; enuk- in ē. 


These steps are to be used iteratively: apply them to the smallest subexpres- 
sion Z’ of €, which is not yet correctly arranged. The result is a gc-standardized 
expression tgcst of t. 


Example 9. Consider the garbage-free expression let a.app(b, Ac.c); b.Ad.d in a, 
where app is a binary function symbol for denoting the usual application of 
the lambda calculus. The standardization algorithm returns the gc-standardized 
expression let b.Ad.d; a.app(b, Ac.c) in a. 


Proposition 1. For every garbage-free NLL-expression e, the gc-standardized 
expression € of € with E ~a €, has a sequence of bindings in all letrec envi- 
ronments that is unique and has a fixed ordering. The computation can be done 
in polynomial time. 


Proof. Garbage collection is polynomial: after every step the expression will be 
smaller, and a single step of detecting a set of redundant bindings is also poly- 
nomial. The rearrangement also can be done first for subexpressions of smaller 
size, and a single rearrangement of the top binding takes polynomial time. 


4.1 Anti-unification of Garbage-Free Expressions 


In this and the next subsection on generalization we will use a syntactically fixed 

ordering of bindings in a let environments, and denote this as letf. 
ANTIUNIFLETR is adapted to the ground situation in several aspects: (i) 

There are no freshness constraints; (ii) expressions are first gc-standardized; (iii) 
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we permit that n > 2 expressions are to be generalized in one step; (iv) in a set 
of expressions to be generalized, we make all top-level letrec environments to be 
of the same (minimal) length by adding bindings a.a with fresh atoms a; and 
(v) we fix the sequence of bindings in a let indicated by letf. 

We remark that an iterated generalization of pairs (i.e., to generalize s1, s2 
and s3 one first generalizes sı and s2, and from the result, say r, one repeat the 
generalization process with r and s3) has the disadvantage that from the second 
step, after the first application of rule, there are generalization variables, and 
the semantic properties get lost, which means that, e.g., the standardization is 
no longer usable, and so the method does no longer work properly in the next 
generalization steps. 

Therefore, for generalizing more than 2 expressions, the data structure 


adopted is: the generalized state is as ({X:s; £ ... & sn}; M; V; L), and we 
use generalization tuples of the form {X:s; =... = s,} to denote that X is a 
variable generalizing expressions s1,..., Sn. Examples for the modified rules are 


{X:f(s1,1,-.-, S1, n) Ê... & f (8m, +++; 8min) }UL, M, L 
X; are fresh variables n =0 is permitted 


Decn 
( j POL X Esri ee Sins An Si Se Sn 
M,LU{X => f(X1,..., Xn)} 
{X:\a.s1 4£,.,4 Aa.sn}ULl, M, L 
(Absaan) A = 
TU{Y:s1 =... = 8n},M, LU{X > da. Y} 
DT, {X:s81 £ E sn Vit £ Ê tn}UM, L 
EaQvM({(s1,..-,; Sn) X (t1,.--,tn)}) =T 
(Mern) 


T,MU{Xi:s1 Sti}, LU{X ery} 


Thus, we adapt the rules of ANTIUNIFLETR: it accepts n > 2 ground expres- 
sions; the permutation-rule (Letrperm) is inactive due to fixing the ordering of 
bindings; merging is supported, and the subalgorithms EQvM and EQVBIEX are 
almost trivial and applied to larger tuples. Also the sequence of bindings in lets 
is fixed. All these adaptations can be done within the polynomial complexity. 

These explanations suggest the algorithm ANTIUNIFNOGARBAGE, for n > 
2 (ground) arguments, operating on a triple: (T, M,L). It is defined non- 
deterministically, but only one run will be done. 


Example 10 (Fixed letr bindings). Generalizing the garbage-collected expres- 
sions let a’.a;b’.b;c’.c in f(g(a’,b’,c’)) and let a'.b;b'.c;c'.a in f(h(a’,b’,c)) 
produces let a’.a;b’.b;c’.c in f(X) since bindings can be rearranged, which 
requires exponential complexity for trying rearrangements. If we fix the 
sequence of bindings and generalize, then the algorithm requires only poly- 
nomial time in this step, then for letf a’.a;b'.b;c’.c in f(g(a’,b’,c)) and 
letf a'.b;b'.c;c'.a in f(h(a’,b’,c)), we obtain letf a’.X1;b/.X9;c'.X3 in f(X). 
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Theorem 4. Algorithm ANTIUNIFNOGARBAGE is sound, terminating and 
complete. It will compute a single least general generalization in polynomial time. 


Proof (Sketch). The main argument is that if no rule applies, then the result is 
already a generalization. Second, every applied rule keeps the semantics, i.e., does 
not lose information. The complexity has two components: one is the preparation 
of the input, which is polynomial. The second part is the test and computation 
of every rule, which is polynomial since there are no V-sets, and the execution 
of every rule requires polynomial time in the input size. Moreover, the size of 
the problem is decreased in every step. 


4.2 Exploiting Semantic Equalities 


Since we focus application of the algorithms in (functional) higher-order pro- 
gramming languages, it makes sense to take more semantic equations and proper- 
ties into account to recognize semantic equality of syntactically different expres- 
sions, which improves the power of generalization algorithms. 

Since there are various approaches and definitions to semantics, like variants 
of contextual equivalences or bisimulations [9,14,15,19] and we want to be con- 
sistent with most of them, we only investigate the equalities that are correct in 
a majority of the cases. By “cases” we mean different programming languages 
permitting letr, but with different operational and equational semantics. 

The following semantically correct equalities, expressed as rewrite rules, in 
languages with letrec could also be used for further standardization of expres- 
sions, where we assume that there are no conflicts with variable names. 


u.f (81,---,8n) > T.F (Y1,---5 Yn); Y1-S1;- -3 Yn-Sn 

let (x = letr env in r); env’ in s > let x = r; env; env’ in s 
let env in (let env’ in s) > let env; env'in s. 

f (let env in s1) s2 —> let env in (f sı s2). 


Epa a 


Note that these equalities if used to standardize expressions keep the poly- 
nomial complexity of generalizations of ground expressions. 


5 Conclusion and Future Work 


We formulated an anti-unification algorithm for expressions in a functional 
higher-order language with a let constructor that has mutually recursive bind- 
ings. We constructed a weakly complete anti-unification algorithm that in the 
general case is finitary, which is improved to being complete by a post-processing. 
In the worst case, the time for the computation as well as the number of gener- 
alizations are exponential. 

In case the expressions are specialized to be ground and garbage-free, then the 
problem becomes unitary and the computation is polynomial. These properties 
make the method more friendly to applications. We also considered modifica- 
tions of the generalization algorithm for functions in functional programming 
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languages with letr that has a wider coverage by abstracting from the syntac- 
tical details and by observing semantic equalities. 

Further work is to generalize algorithms to other patterns and to experiment 
with the generalization method in practice. 
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Abstract. Numerous confluence criteria for plain term rewrite systems 
are known. For logically constrained rewrite system, an attractive exten- 
sion of term rewriting in which rules are equipped with logical con- 
straints, much less is known. In this paper we extend the strongly-closed 
and (almost) parallel-closed critical pair criteria of Huet and Toyama to 
the logically constrained setting. We discuss the challenges for automa- 
tion and present crest, a new tool for logically constrained rewriting in 
which the confluence criteria are implemented, together with experimen- 
tal data. 
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1 Introduction 


Logically constrained rewrite systems constitute a general rewrite formalism with 
native support for constraints that are handled by SMT solvers. They are use- 
ful for program analysis, as illustrated in numerous papers [2,3,5,13]. Several 
results from term rewriting have been lifted to constrained rewriting. We men- 
tion termination analysis [6,7,12], rewriting induction [3], completion [12] as well 
as runtime complexity analysis [13]. 

In this paper we are concerned with confluence analysis of logically con- 
strained rewrite systems (LCTRSs for short). Only two sufficient conditions for 
confluence of LCTRSs are known. Kop and Nishida considered (weak) orthogo- 
nality in [8]. Orthogonality is the combination of left-linearity and the absence 
of critical pairs, in a weakly orthogonal system trivial critical pairs are allowed. 
Completion of LCTRSs is the topic of [12] and the underlying confluence con- 
dition of completion is the combination of termination and joinability of critical 
pairs. In this paper we add two further confluence criteria. Both of these extend 
known conditions for standard term rewriting to the constrained setting. The 
first is the combination of linearity and strong closedness of critical pairs, intro- 
duced by Huet [4]. The second, also due to [4], is the combination of left-linearity 
and parallel closedness of critical pairs. We also consider an extension of the lat- 
ter, due to Toyama [11]. 
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Overview. The remainder of this paper is organized as follows. In the next 
section we summarize the relevant background. Section 3 recalls the existing 
confluence criteria for LCTRSs and some of the underlying results. The new 
confluence criteria for LCTRSs are reported in Sect. 4. In Sect. 5 the automation 
challenges we faced are described and we present our prototype implementation 
crest. Experimental results are reported in Sect.6, before we conclude in Sect. 7. 


2 Preliminaries 


We assume familiarity with the basic notions of term rewrite systems (TRSs) [1], 
but shortly recapitulate terminology and notation that we use in the remainder. 
In particular, we recall the notion of logically constrained rewriting as defined 
in [3,8]. 

We assume a many-sorted signature F and a set V of (many-sorted) variables 
disjoint from F. The signature F is split into term symbols from Fe and theory 
symbols from Fin. The set T(F,V) contains the well-sorted terms over this 
signature and 7 (Fn) denotes the set of well-sorted ground terms that consist 
entirely of theory symbols. We assume a mapping Z which assigns to every 
sort ų¿ occurring in Fy, a carrier set Z(z), and an interpretation J that assigns 
to every symbol f € Fi, with sort declaration t1 X +--+ X tn — kK a function 
fa: Tu) X+- X L(tn) — T(K). Moreover, for every sort ı occurring in Fin we 
assume a set Val, C Fi, of value symbols, such that all c € Val, are constants 
of sort ¿ and J constitutes a bijective mapping between Val, and Z(v). Thus 
there exists a constant symbol in Fyn for every value in the carrier set. The 
interpretation J naturally extends to a mapping [-] from ground terms in T (Fn) 
to values in Val = Uepom(z) Vali: [f(ti,---5tn)] = fz ([ti],---, [tn]) for all 
f(ti,..-,tn) E T(Fin). So every ground term in Z (Fh) has a unique value. 
We demand that theory symbols and term symbols overlap only on values, i.e., 
Fre Fin C Val. A term in T (Fin, V) is called a logical term. 

Positions are strings of positive natural numbers used to address subterms. 
The empty string is denoted by e. We write q < p and say that p is below q if 
qq = p for some position q’, in which case p\q is defined to be gq’. Furthermore, 
q < pifq < pand q ¥ p. Finally, positions q and p are parallel, written as q || p, if 
neither q < p nor p < q. The set of positions of a term t is defined as Pos(t) = {€} 
if t is a variable or a constant, and as Pos(t) = {e} U {iq | 1 < i < nand q € 
Pos(t;)} if t = f(ti,...,tn) with n > 1. The subterm of t at position p € Pos(t) 
is defined as t|, = t if p = and as t|p = f,|, if p = iq and t = f(t1,..-, tn). We 
write s[t]p for the result of replacing the subterm at position p of s with t. We 
write Posy (t) for {p € Pos(t) | tlp € V} and Pos (t) for Pos(t) \ Posy(t). The 
set of variables occurring in the term t is denoted by Var(t). A term t is linear 
if every variable occurs at most once in it. A substitution is a mapping o from 
Y to T(F,V) such that its domain {x E V | o(x) # x} is finite. We write to for 
the result of applying o to the term t. 

We assume the existence of a sort bool such that Z(bool) = B = {T, L}, 
Valboot = {true, false}, [true] = T, and [false] = L hold. Logical terms of sort 
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bool are called constraints. A constraint y is valid if [py] = T for all substitutions 
y such that y(x) € Val for all z € Var(y). 

A constrained rewrite rule is a triple £ — r |p] where £,r € T(F, V) are terms 
of the same sort such that root(£) € Fre \ Fin and ¢ is a logical term of sort bool. 
If y = true then the constraint is often omitted, and the rule is denoted as £ — r. 
We denote the set Var(y) U(Var(r) \ Var(£)) of logical variables in p: £ > r [p] by 
LVar(p). We write EVar(p) for the set Var(r) \ (Var(£) U Var(w)) of variables that 
appear only in the right-hand side of p. Note that extra variables in right-hand 
sides are allowed, but they may only be instantiated by values. This is useful 
to model user input or random choice [3]. A set of constrained rewrite rules is 
called a logically constrained rewrite system (LCTRS for short). 

The LCTRS R introduced in the example below computes the maximum of 
two integers. 


Example 1. Before giving the rules, we need to define the term and theory sym- 
bols, the carrier sets and interpretation functions: 


Fre = {max: int x int > int} U {0,1,...: int} Leach = Tint = Z 
Fey = {0,1,...: int} U {true, false: bool} U {7: bool = bool} 

U {—: int > int} U {A: bool x bool = bool} 

U {+,—: int x int > int} U {<, >, <, >,= : int x int > bool} 


The interpretations for theory symbols follow the usual semantics given in the 
SMT-LIB theory Ints! used by the SMT-LIB logic QF_LIA. The LCTRS R 
consists of the following constrained rewrite rules 


max(z,y) >a [#>y] max(xz,y)—>yly >x] max(x,y) > max(y, x) 


In later examples we refrain from spelling out the signature and interpreta- 
tions of the theory Ints. We now define rewriting using constrained rewrite rules. 
LCTRSs admit two kinds of rewrite steps. Rewrite rules give rise to rule steps, 
provided the constraint of the rule is satisfied. In addition, theory calls of the 
form f(v1,..., Un) with f € Fy, \ Val and values v1,..., Up, can be evaluated in a 
calculation step. In the definition below, a substitution ø is said to respect a rule 
p: £— r [yp], denoted by a F p, if Dom(c) = Var(£) UVar(r) UVar(y), a(x) € Val 
for all x € LVar(p), and vo is valid. Moreover, a constraint y is respected by ø, 
denoted by c F y, if a(x) € Val for all x € Var(y) and vo is valid. 


Definition 1. Let R be an LCTRS. A rule step s >u t satisfies s|, = lo 
and t = s|ro]p for some position p and constrained rewrite rule L — r [y] that 
is respected by the substitution ø. A calculation step s —>ca t satisfies s|, = 
f(v1,.-.,Un) and t = s[v]p for some f E€ Fin \ Val, vi,...,Un € Val with v = 
[f(v1,---5Un)]. In this case f(x1,..., £n) > y [y = f(£1,...,£n)] with a fresh 
variable y is a calculation rule. The set of all calculation rules is denoted by Rea. 
The relation >r associated with R is the union of ry U ca. 


1 http: //smtlib.cs.uiowa.edu/Theories /Ints.smt2. 
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We sometimes write —>pjpļo to indicate that the rewrite step takes place at 
position p, using the constrained rewrite rule p with substitution ø. 


Example 2. We have max(1 + 2,4) >r max(3,4) >r max(4,3) >r 4 in the 
LCTRS of Example 1. The first step is a calculation step. In the third step we 
apply the rule max(z,y) > x [x > y] with substitution o = {x > 4,yr 3}. 


3 Confluence 


In this paper we are concerned with the confluence of LCTRSs. An LCTRS R 
is confluent if t >h - R| u for all terms s, t and u such that t R s >R u. 
Confluence criteria for TRSs are based on critical pairs. Critical pairs for LCTRS 
were introduced in [8]. The difference with the definition below is that we add 
dummy constraints for extra variables in right-hand sides of rewrite rules. 


Definition 2. An overlap of an LCTRS R is a triple (pi, p, p2) with rules 
pi: L > rı [y1] and p2: l2 > r2 [pe], satisfying the following conditions: 


pı and pz are variable-disjoint variants of rewrite rules in RU Rea, 

p E€ Posr(£2), 

Lı and bəļp are unifiable with a mgu o such that o(x) € ValUV for all 
x € LVar(p1) U LVar(p2), 

4. p10 A p20 is satisfiable, and 

5. if p=e then pı and pz are not variants, or Var(rı) É Var(£1). 


wrs 


In this case we call Lao|rio]p © r20 [pia A p20 Apo] a constrained critical pair 
obtained from the overlap (p1, p, p2). Here 


p= \ {x = x | x € EVar(p1) U EVar(p2) } 
The set of all constrained critical pairs of R is denoted by CCP(R). 


In the following we drop “constrained” and speak of critical pairs. The con- 
dition Var(r;) É Var(¢,) in the fifth condition is essential to correctly deal with 
extra variables in rewrite rules. The equations (7) added to the constraint of a 
critical pair save the information which variables in a critical pair were intro- 
duced by variables only occurring in the right-hand side of a rewrite rule and 
therefore should only be instantiated by values. Critical pairs as defined in [8, 12] 
lack this information. The proof of Theorem 2 in the next section makes clear 
why those trivial equations are essential for our confluence criteria, see also 
Example 9. 


Example 3. Consider the LCTRS consisting of the rule 
p: f(z) > z [x = 272] 


The variable z does not occur in the left-hand side and the condition Var(rı) £ 
Var(lı) ensures that p overlaps with (a variant of) itself at the root position. 
Note that R is not confluent due to the non-joinable local peak —4 — f (16) — 4. 
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Example 4. The LCTRS R of Example 1 admits the following critical pairs: 


aryla>yAy>z] (1, €, 2) 
x x max(y, x) [x > y] (1, €, 3) 
y ~ max(y, x) [y > x] (2, €, 3) 


The originating overlap is given on the right, where we number the rewrite rules 
from left to right in Example 1. 


Actually, there are three more overlaps since the position of overlap (e€) is 
the root position. Such overlaps are called overlays and always come in pairs. 
For instance, max(y, x) ~ x [a > y] is the critial pair originating from (3, €,1). 
For confluence criteria based on symmetric joinability conditions of critical pairs 
(like weak orthogonality and joinability of critical pairs for terminating systems) 
we need to consider just one critical pair, but this is not true for the criteria 
presented in the next section. 

Logically constrained rewriting aims to rewrite (unconstrained) terms with 
constrained rules. However, for the sake of analysis, rewriting constrained terms 
is useful. In particular, since critical pairs in LCTRSs come with a constraint, 
confluence criteria need to consider constrained terms. The relevant notions 
defined below originate from [3,8]. 


Definition 3. A constrained term is a pair s |p] of a term s and a constraint p. 
Two constrained terms s |p] and t |y] are equivalent, denoted by s [p] ~ t [4], 
if for every substitution y respecting p there is some substitution 6 that respects 
w such that sy = tô, and vice versa. Let R be an LCTRS and s |p] a constrained 
term. If s|\p = lo for some constrained rewrite rule p: L — r |Y], position p, and 
substitution o such that o(x) € ValUVar(y) for all x € LVar(p), p is satisfiable 
and p = wo is valid then 


s [p] >r slro]p [¥] 
is a rule step. If s|p = f(s1,...,5n) with f E Fih \ Fre and s1,...,Sn € Val U 
Var(y) then 
s [p] >a s[z]p [pAr = f(si,.--,8n)] 


is a calculation step. Here x is a fresh variable. We write >r for Sn Ue 
and the rewrite relation =r on constrained terms is defined as ~ ->r ~. 


Positions in connection with œr steps always refer to the underlying steps 
in >r. We give an example of constrained rewriting. 


Example 5. Consider again the LCTRS R of Example 1. We have 
max(x + y,6) [£ > 2^y > 4] >r max(z,6) [e@ >2Ay>4Az=ar4+y] 
>r z |r > 2Ay>24Az=z+y] 


The first step is a calculation step. The second step is a rule step using the rule 
max(z,y) > x [x > y] with the substitution øo = {x > z, y m 6}. Note that the 
constraint (x >2Ay>4Az=a+y) > z > 6 is valid. 
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Definition 4. A critical pair s ~ t |ọ] is trivial if so = to for every substitution 
o with o E p.? A left-linear LCTRS having only trivial critical pairs is called 
weakly orthogonal. A left-linear TRS without critical pairs is called orthogonal. 


The following result is from [8]. 


Theorem 1. Weakly orthogonal LCTRS are confluent. 


Example 6. The following left-linear LCTRS computes the Ackermann function 
using term symbols from Fe = {ack : int x int > int} U{0,1,---: int} and the 
same theory symbols, carrier sets and interpretations as in Example 1: 

ack(0,n) => n+ 1 [n > 0] 

ack(m, 0) — ack(m — 1,1) [m > 0] 

ack(m,n) — ack(m — 1,ack(m,n — 1)) [m > 0 An > 0] 

ack(m,n) > 0 [m<0Vn<0] 


Since the conjunction of any two constraints is unsatisfiable, R lacks critical 
pairs. Hence R is confluent by Theorem 1. 


The following result is proved in [12] and forms the basis of completion of 
LCTRSs. 


Lemma 1. Let R be an LCTRS. Ift R— s >r u thent leu ort er u 


In combination with Newman’s Lemma, the following confluence criterion is 
obtained. 


Corollary 1. A terminating LCTRS is confluent if all critical pairs are join- 
able. 


This is less obvious than it seems. Joinability of a critical pair s ~ t [y] 
cannot simply be defined as s [p] >% - g< t [y], as the following example 
shows. 


Example 7. Consider the terminating LCTRS R consisting of the rewrite rules 


f(x,y) > g(z,1 +1) h(f(x,y)) > h(g(y,1+ 1)) 
The single critical pair h(g(x,1 + 1)) = h(g(y,1 + 1)) should not be joinable 
because R is not confluent, but we do have 
h(g(z,1 + 1)) ca h(g(a, z)) [z = 1 +1] ~ h(g(y, v)) [v= 141) 
h(g(y,1+1)) ca h(g(y,v)) [v = 1+1] 
due to the equivalence relation ~ on constrained terms; since x and y do not 


appear in the constraints, there is no demand that they must be instantiated 
with values. 


? The triviality condition in [8] is wrong. Here we use the corrected version in an 
update of [8] announced on Cynthia Kop’s website (accessible at https: //www.cs.ru. 
nl/~cynthiakop/frocos13.pdf). 
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The solution is not to treat the two sides of a critical pair in isolation but 
define joinability based on rewriting constrained term pairs. So we view the 
symbol ~ in a constrained equation s ~ t |p] as a binary constructor symbol 
such that the constrained equation can be viewed as a constrained term. Steps 
in s take place at positions > 1 whereas steps in t use positions > 2. The same 
is done in completion of LCTRSs [12]. 


Definition 5. We call a constrained equation s ~ t |p] trivial if so = to for any 
substitution o with o E p. A critical pair s ~ t [y] is joinable if s = t [p] >R 
u xv [y] and u ~v [Y] is trivial. 


We revisit Example 7. 


Example 8. For the critical pair in Example 7 we obtain 


h(g(a,1+ 1)) ~ h(g(y,1+ 1)) 
x,v)) = h(g(y, 1+1) [v=1+1] 
x,v)) = h(g(y,z)) [v=1+1Az=1+1] 


The substitution ø = {v > 2, z > 2} respects the constraint v = 1+1^z = 1+1 
but does not equate h(g(x,v)) and h(g(y, z)). 


The converse of Corollary 1 also holds, but note that in contrast to TRSs, 
joinability of critical pairs is not a decidable criterion for terminating LCTRSs, 
due to the undecidable triviality condition. Moreover, for the converse to hold, 
it is essential that critical pairs contain the trivial equations w~ in Definition 2. 


Example 9. Consider the LCTRS R consisting of the rules 


f(a) > gly) gy) >a [y =y] 


which admits the critical pair g(y) ~ g(y’) [y = y ^y’ = y'] originating from the 
overlap (f(a) — g(y),¢, f(x’) > g(y’)). This critical pair is joinable as y and y’ 
are restricted to values and thus both sides rewrite to a using the second rule. 
As R is also terminating, it is confluent by Corollary 1. If we were to drop ~ in 
Definition 2, we would obtain the non-joinable critical pair g(y) ~ g(y’) instead 
and wrongly conclude non-confluence. 


4 Main Results 


We start with extending a confluence result of Huet [4] for linear TRSs. Below 
we write —>>p to indicate that the position of the contracted redex in the step 
is below position p. 


Definition 6. A critical pair s ~ t |p] is strongly closed if 


L 


[y] for some trivial u ~ v [y], and 


W Il W II 
N 


L 


[y] for some trivial u ~ v [y]. 


jai. 
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A binary relation — on terms is strongly confluent if t —* - =< u for all 
terms s, t and u with t — s > u. (By symmetry, also t >= - *— u is required.) 
Strong confluence is a well-known sufficient condition for confluence. Huet [4] 
proved that linear TRSs are strongly confluent if all critical pairs are strongly 
closed. Below we extend this result to LCTRSs, using the above definition of 
strongly closed constrained critical pairs. 


Theorem 2. A linear LCTRS is strongly confluent if all its critical pairs are 
strongly closed. 


We give full proof details in order to illustrate the complications caused by 
constrained rewrite rules. The following result from [12] plays an important role. 


Lemma 2. Suppose s X t |p] =p u ~ v [y] and y F ọ. If p > 1 then sy — ud 
and ty = vô for some substitution ô with ô F w. If p > 2 then sy = ud and 
ty — vô for some substitution 6 with ô E w. 


Proof (of Theorem 2). Consider an arbitrary local peak 


t “pi |pilor $ pal p2|o2 U 

with rewrite rules p1: 41 > rı [yi] and p2: l2 > re [p2] from R U Rea. We 
may assume that pı and p2 have no variables in common, and consequently 
Dom(a1) N Dom(o2) = Ø. We have slp, = h01, t = s[rioi|p, and oj F 41. 
Likewise, s|p, = (202, U = 8[r202]p, and o2 F Yo. If pı || p2 then 


t > p2|p2|o2 t[r202]p> = u[rioi]p, —pilpilor Y 


Hence both t —* - ~— u and t —~ - *— u. If pı and pg are not parallel 
then pı < p2 or pə < pı. Without loss of generality, we consider pı < po. Let 
q = p2\pı. We do a case analysis on whether or not q € Pos (4). 


— First suppose q ¢ Posr¢(¢1). Let q = q1q2 such that qı € Posy(4) and let z be 
the variable in ¢; at position q1. We have L202 = xo1|q, and thus g1 (x) ¢ Val. 
Define the substitution į as follows: 


, Lo1lr202]q ify =z 
oily) = ; 
oily) otherwise 


We show t >> s[rio{]p, — u, which yields t —* -=— u and t >> . *— u. 
Since R is left-linear, 4101 = hoileoi]a = hoileoilr2co]g]qa = £101[T202]q 
and thus u = s[r209],. = ski1oi[lr202]q]pı = 8[l104]p,. If we can show ci E pı 
then u — s|rıci]pı. Consider an arbitrary variable y € LVar(p1). If y # x 
then oi (y) = oily) € Val since o1 F pı. If y = x then x € Var(y) since 
x € Var(l4ı). However, this contradicts a, F pı as o1(x) ¢ Val. So a} (y) = 
oily) for all y € LVar(p1) and thus g} F pı is an immediate consequence of 
cı F p1. It remains to show t >> s[rjo‘]p,. If x ¢ Var(ri) then rio, = rıcı 
and thus t = s[rio{]p,. If £ € Var(rı) then there exists a unique position 


482 J. Schöpf and A. Middeldorp 


q! € Posy(r1) such that r1|q¢ = x, due to the right-linearity of R. Hence 
104 = 7101[€01[r202] qo]q" = 1101 |"202]q’qo- Since 1104 |q/qo = £202 we obtain 
t = s[ri0i}p, piq’aa| aloo SI71 7 lp, as desired. 

— Next suppose q € Pos-(f1). The substitution o’ = 01 U a2 satisfies £;|,0’ = 
fi|qo. = £202 = fg0’ and thus is a unifier of ¢;|, and £2. Since a1 F pı 
and o2 F p2, o'(x) € Val for all x € LVar(p1) U LVar(p2). Let o be an 
mgu of ilq and £9. Since ø is at least as general as o’, o(a) € Val U V 
for all x € LVar(p1) U LVar(p2). Since yio’ = yioi and yoo’ = P202 are 
valid, yio A p20 is satisfiable. Hence conditions 1, 2, 3 and 4 in Definition 2 
hold for the triple (p2,q, 1). If condition 5 is not fulfilled then q = € (and 
thus pı = p2), p2 and pı are variants, and Var(r2) C Var(é2) (and thus also 
Var(r1) C Var(,)). Hence 4101 = fg02 and rıcı = r202, and thus t = u. In 
the remaining case condition 5 holds and hence (p2,q,/1) is an overlap. By 
definition, (:0[roo]q ~ r19 [p20 A p10 A wo] with 


y= VAN {x = z | x € EVar(p1) U EVar(p2)} 


is a critical pair. To simplify the notation, we abbreviate ¢,0[raa], to s’, 
rı to t, and p20 A p10 A Wo to yy’. Critical pairs are strongly closed by 
assumption, and thus both 

1L. s wt! [p] >51: SS. u ~ v [Y] for some trivial u ~ v [Y], and 

2. s at [o] >52: SS, u~ v [Y] for some trivial u ~ v [y]. 
Let y be the substitution such that oy = o’. We claim that y respects y’. So 
let x € Var(y’) = Var(p20 A p10 Awa). We have 


LVar(p1) = Var(y1) U EVar(p1) LVar(p2) = Var(p2) U EVar( p2) 
Together with Var(y) = EVar(p1) U EVar(p2) we obtain 
LVar(p1) U LVar(p2) = Var( p1) U Var(p2) U Var(y) 


Since o’(x) € Val for all x € LVar(pı) U LVar(p2), we obtain q(x) € Val 
for all x € Var(y’) and thus y F y’. At this point repeated applications of 
Lemma 2 to the constrained rewrite sequence in item 1 yields a substitution 
ô respecting Y’ such that s'y >* uô and t'y = vô. Since u ~ v |Y] is trivial, 
ud = vd and hence s'y —* - =< t'y. Likewise, s’y >= - *— t'y is obtained 
from item 2. We have 


s'y = (f10[ree]q)y = £10 [r2o"|q = C101[T202]q ty=nio' =17101 


Moreover, t = s[rigi]p, = s{t/y]p, and u = s[€:01[r202]qlp, = s[s’Y]p,. Since 
rewriting is closed under contexts, we obtain u —* : ~~ t and u >= - *<—t. 
This completes the proof. 


Example 10. Consider the LCTRS R of Example 1 and its critical pairs in Exam- 
ple 4. The critical pair 
xe max(y,x) [x > y] 
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is not trivial, so Theorem 1 is not applicable and the rule max(z, y) > max(y, x) 
precludes the use of Corollary 1 to infer confluence. We do have 


xm max(y,x) [© >y] Z exe [e> y] 


by applying the rule max(z, y) — y [y > x] and the resulting constrained equa- 
tion x ~ x [x > y] is obviously trivial. The same reasoning applies to the critical 
pair y ~ max(y,x) [y > x]. The first critical pair x ~ y |z > yAy È z] 
in Example 4 is trivial since any (value) substitution satisfying its constraint 
x>yAy > z equates x and y. By symmetry, all critical pairs of R are strongly 
closed. Since FR is linear, confluence follows from Theorem 2. 


The second main result is the extension of Huet’s parallel closedness condition 
on critical pairs in left-linear TRSs [4] to LCTRSs. To this end, we first define 
parallel rewriting for LCTRSs. 


Definition 7. Let R be an LCTRS. The relation +R is defined on terms induc- 
tively as follows: 


£ Rr « for all variables x, 

f(si,---,$n) Pr f(ti,---,tn) if si PR ti for alll <i<n, 
lo +r ro with lor |y] ER ando Flr [yp], 
f(v1,---,Un) Bv with f © Fin \ Val, v1,..., Un E Val and 


v=([f(u1,.--,Un)].- 


We write ++, to indicate that all positions of contracted redexes in the 
parallel step are below p. In the next definition we add constraints to parallel 
rewriting. 


mw wR 


Definition 8. Let R be an LCTRS. The relation +R is defined on constrained 
terms inductively as follows: 


1. x |p] +R z |p] for all variables x, 

2. f(si,---,8n) [Y] PR (ti, tn) [PAY] if si [p] +R ti [p A Yi] for all 
1<i<n andy =p A AYn, 

3. lo [p] r ro [y] with p: £ > r [w] € R, a(x) € Val U Var(y) for all 
x € LVar(p), p is satisfiable and y => wo is valid, 

4. f(vi1,.--,0n) [p] > v [p Av = f(v... ,Un)] with v1,...,Un E ValU Var(y), 
f E€ Fin \ Val and v is a fresh variable. 


Here we assume that different applications to case 4 result in different fresh 
variables. The constraint w in case 2 collects the assignments introduced in earlier 
applications of case 4. (If there are none, Y = true is omitted.) The same holds 
for Yı,..-, Yn. We write +6 for the relation ~ - -r : ~. 


In light of the earlier developments, the following definition is the obvious 
adaptation of parallel closedness for LCTRSs. 
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Definition 9. A critical pair s ~ t |p] is parallel closed if 
set [p] P> u~ v [4] 
for some trivial u x v [y4]. 


Note that the right-hand side t of the constrained equation s ~ t [p] may 
change due to the equivalence relation ~, cf. the statement of Lemma 2. 


Theorem 3. A left-linear LCTRS is confluent if its critical pairs are parallel 
closed. 


To prove this result, we adapted the formalized proof presented in [10] to 
the constrained setting. The required changes are very similar to the ones in the 
proof of Theorem 2. 


Example 11. Consider the LCTRS R with rules 
f(x,y) > glay +y) [y> z^y=1] a—b 
h(f(x,y)) > h(g(b, 2)) [x > y] g(x,y) > gly, x) 
The single critical pair h(g(a,y + y)) ~ h(g(b,2)) [y > aAy=1Azuz > y| is 
parallel closed: 
h(g(a,y + y)) ~ h(g(b, 2)) [y> r^y=1Az > y] 
+1 h(g(b, z)) ~ h(g(b,2)) [y> r^Ay=1^Az>2y^z=y+y] 


and the obtained equation is trivial. Hence R is confluent by Theorem 3. Note 
that the earlier confluence criteria do not apply. 


We also consider the extension of Huet’s result by Toyama |11], which has a 
less restricted joinability condition on critical pairs stemming from overlapping 
rules at the root position. Such critical pairs are called overlays whereas critical 
pairs originating from overlaps (p1, p, p2) with p > e are called inner critical 
pairs. 


Definition 10. An LCTRS R is almost parallel-closed if every inner critical 
pair is parallel closed and every overlay s ~ t |p] satisfies 


sxt ly] P>: >32 ur [y] 
for some trivial u x v [y]. 
Theorem 4. Left-linear almost parallel-closed LCTRSs are confluent. 


Again, the formalized proof of the corresponding result for plain TRSs in [10] 
can be adapted to the constrained setting. 


Example 12. Consider the following variation of the LCTRS œR in Example 11: 
f(x,y) > glayt+y) [y2 z^y=1] a—b 
f(x,y) > g(b, 2) [x > y] g(x,y) > gly, x) 


The overlay g(b,2) ~ g(a,y + y) [xz > y ^y È z^y = 1] is not parallel closed 
but one readily confirms that the condition in Definition 10 applies. 
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5 Automation 


As it is very inconvenient and tedious to test by hand if an LCTRS satisfies 
one of the confluence criteria presented in the preceding sections, we provide an 
implementation. The natural choice would be to extend the existing tool Ctrl [9] 
because it is currently the only tool capable of analyzing confluence of LCTRSs. 
However, Ctrl is not actively maintained and not very well documented, so we 
decided to develop a new tool for the analysis of LCTRSs. Our tool is called crest 
(constrained rewriting software). It is written in Haskell, based on the Haskell 
term-rewriting® library and allows the logics QF_LIA, QF_NIA, QF_LRA. 

The input format of crest is described on its website.* After parsing the input, 
crest checks that the resulting LCTRS is well-typed. Missing sort information 
is inferred. Next it is checked concurrently whether one of the implemented 
confluence criteria applies. crest supports (weak) orthogonality, strong closedness 
and (almost) parallel closedness. The tool outputs the computed critical pairs 
and a “proof” describing how these are closed, based on the first criterion that 
reports a YES result. Below we describe some of the challenges that one faces 
when automating the confluence criteria presented in the preceding sections. 

First of all, how can we determine whether a constrained critical pair or 
more generally a constrained equation s ~ t |p] is trivial? The following result 
explains how this can be solved by an SMT solver. 


Definition 11. Given a constrained equation s ~ t |p], the formula T(s,t, p) 
is inductively defined as follows: 


true ifs=t 
s=t if s,t € Val U Var(y) 


Pnp A Tli top) if s = f(s1,.-.,8n) andt = f(ti,...,tn) 
i=1 


false otherwise 


Lemma 3. A constrained equation s ~ t [y] is trivial if and only if the formula 
p = T(s,t,p) is valid. 


Proof. First suppose y = > T(s,t,y) is valid. Let o be a substitution with 
a F . Since o(x) € Val for all x € Var(p), we can apply ø to the formula 
p = T(s,t,y). We obtain [yo] = T from o F vy. Hence also [T(s,t, p)o] = T. 
Since T(s,t,y) is a conjunction, the final case in the definition of T(s,t, p) is 
not used. Hence Pos(s) = Pos(t), s(p) = t(p) for all internal positions p in s and 
t, and s|,o0 = t|po for all leaf positions p in s and t. Consequently, sa = to. This 
concludes the triviality proof of s ~ t [y]. 

For the only if direction, suppose s ~ t [y] is trivial. Note that the variables 
appearing in the formula p => T(s,t, p) are those of y. Let o be an arbitrary 


3 https: //hackage.haskell.org / package /term-rewriting-0.4.0.2. 
4 http: //cl-informatik.uibk.ac.at /software/crest /. 
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assignment such that [yo] = T. We need to show [T(s,t,y)o] = T. We can 
view ø as a substitution with o(x) € Val for all x € Var(y). We have o F y and 
thus so = to by the triviality of s ~ t |y]. Hence T(s,t, p) is a conjunction of 
equations between values and variables in y, which are turned into identities by 
o. Hence [T'(s, t, y)o] = T as desired. 


The second challenge is how to implement rewriting on constrained equations 
in particular, how to deal with the equivalence relation ~ defined in Definition 3. 


Example 13. The LCTRS R 
f(a) > z [z = 3] g(f(x)) >a g(3) >a 
over the integers admits two critical pairs: 
ze2 [z=3Az7 =3]) g(z) alz=3] 


The first one is trivial, but to join the second one, an initial equivalence step is 
required: 
g(2) a [z =3] ~ g(3) ~a [2 =3] awa [z =3] 


The transformation introduced below avoids having to look for an initial 
equivalence step before a rule becomes applicable. 


Definition 12. Let R be an LCTRS. Given a term t € T(F,V), we replace 
values in t by fresh variables and return the modified term together with the 
constraint that collects the bindings: 


(t, true) ifteV 
tf(t) = 4 (z,z =t) if t € Valandz is a fresh variable 
(f(s1, ssa Sn), pı Aas A Pn) ift = fli, salts ,tn)and tf (ti) = (Si, pi) 


Applying the transformation tf to the left-hand sides of the rules in R produces 
tf(R) = {lr [pAY]|l—>7r [p] € Randtf(é) = (¢,o)} 


Example 14. Applying the transformation tf to the LCTRS R of Example 13 
produces the rules 


f(a) > z [z= 3] g(f(x)) >a g(z) >a [z = 3] 


The critical pair g(z) ~ a [z = 3] can now be joined by an application of the 
modified third rule. Note that the modified rule does not overlap with the second 
rule because z may not be instantiated with f(x). Hence the modified LCTRS 
tf(R) is strongly closed and, because it is linear, also confluent. 


In the following we show the correctness of the transformation. In particular 
we prove that the initial rewrite relation is preserved. 
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Table 1. Specific experimental results. 


result method time (in ms) 
[12, Example 23] Timeout | — 10017.70 
[12, Example 23] corrected | YES strongly closed 103.71 
Example 6 YES orthogonal 34.35 
[8, Example 3] YES weakly orthogonal 50.87 
Example 1 YES strongly closed 115.33 
[10, Example 1] YES strongly closed 3806.84 
Example 11 YES parallel closed 38.42 
Example 12 YES almost parallel closed 130.36 


Lemma 4. The relations >r and +R) coincide on unconstrained terms. 


Proof. Consider s,t € T(¥,V). Since the transformation tf does not affect cal- 
culation steps, it suffices to consider rule steps. First assume s = Cléo] yu 
Cro] = t by applying the rule £ — r [y] E€ R and let V — r [y'] € tf(R) be its 
transformation. So tf(@) = (V, Y) and y’ = yA wv. Define the substitution 


d'= {lp => lp | (C, Y) = tf(£), p € Pos(é) and £|, € Val} 


and let rT = o Ua’. Since Dom(o) N Dom(a’) = Ø by construction, T is well- 
defined. From 0 F £ > r [p] and o’ E = we immediately obtain T E 7 > r [y’], 
which yields s = Clé’T] >u C[r7] = t in tf(R). 

For the other direction consider s = Co] >u C[r’o| = t by applying the 
rule ’ — r’ [y’] € tf(R). The difference between ¢’ and its originating left- 
hand side @ in R is that value positions in £ are occupied by fresh variables in 
l. Because o’ respects y’ = yA Y, o’ substitutes the required values at these 
positions in £. Asa F £’ — r’ [y’], there exists a rule £ — r [p] which is respected 
by o and thus s = Clo] >, Clro] = t in R. 


As the transformation is used in the implementation and rewriting on con- 
strained terms plays a key role, the following result is needed. The proof is similar 
to the first half of the proof of Lemma 4 and omitted. 


Lemma 5. The inclusion >r C xR) holds on constrained terms. 


6 Experimental Results 


In order to evaluate our tool we performed some experiments. As there is no 
official database of interesting confluence problems for LCTRSs, we collected 
several LCTRSs from the literature and the repository of Ctrl. The problem files 
in the latter that contain an equivalence problem of two functions for rewriting 
induction were split into two separate files. The experiments were performed 
on an AMD Ryzen 7 PRO 4750U CPU with a base clock speed of 1.7 GHz, 8 
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Table 2. Comparison between confluence criteria implemented in crest. 


OWS |P|A 
orthogonality (O) 74|74)11| 74) 74 
weak orthogonality (W) 78 | 13 | 78| 78 
strongly closed (S) 20 | 16 | 20 
parallel closed (P) 83 | 83 
almost parallel closed (A) 89 


cores and 32 GB of RAM. The full set of benchmarks consists of 127 problems 
of which crest can prove 90 confluent, 11 result in MAYBE and 26 in a timeout. 
With a timeout of 5s crest needs 141.09s to analyze the set of benchmarks. 
We have tested the implementation with 3 well-known SMT solvers: Z3, Yices 
and CVC5. Among those Z3 gives the best performance regarding time and the 
handling of non-linear arithmetic. Hence we use Z3 as the default SMT solver in 
our implementation. In Table 1 we list some interesting systems from this paper 
and the relevant literature. Full details are available from the website of crest. 
We choose 5 as the maximum number of steps in the —* parts of the strongly 
closed and almost parallel closed criteria. 

From Table 2 the relative power of each implemented confluence criterion on 
our benchmark can be inferred, i.e., it depicts how many of the 127 problems 
both methods can prove confluent. This illustrates that the relative applicability 
in theory (e.g., weakly orthogonal LCTRSs are parallel closed), is preserved in 
our implementation. We conclude this section with an interesting observation 
discovered by crest when testing [12, Example 23]. 

We also tested the applicability of Corollary 1, using the tool Ctrl as a black 
box for proving termination. Of the 127 problems, Ctrl claims 102 to be termi- 
nating and 67 of those can be shown locally confluent by crest, where we limit 
the number of steps in the joining sequence to 100. It is interesting to note that 
all of these problems are orthogonal, and so proving termination and finding a 
joining sequence is not necessary to conclude confluence, on the current set of 
problems. Of the remaining 35 problems, crest can show confluence of 5 of these 
by almost parallel closedness. 


Example 15. The LCTRS R is obtained by completing a system consisting of 
four constrained equations: 


1. f(x,y) > f(z,y)+1 [r> 1^Az=x-— 1] 

2. f(x,0) > g(1,x) [x < 1] 

3. g(0,y) ~y |x <0 5. h(x) > g(1,x)+1 [z< 1] 

4. g(1,1) — g(1,0)+1 6. h(x) > f(w@—1,0)+2 [z > 1] 


Calling crest on R results in a timeout. As a matter of fact, the LCTRS is not 
confluent because the critical pair 


g(1,7)+1f(@—-1,0)4+2[%<1lAgr>1] 
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between rules 5 and 6 is not joinable. Inspecting the steps in [12, Example 23] 
reveals some incorrect applications of the inference rules of constrained comple- 
tion, which causes rule 6 to be wrong. Replacing it with the correct rule 


6’. h(x) > (F(z,0)+1)4+1[a>1Az=a2-]] 


causes crest to report confluence by strong closedness. 


7 Concluding Remarks 


In this paper we presented new confluence criteria for LCTRSs as well as a new 
tool in which these criteria have been implemented. We clarified the subtleties 
that arise when analyzing joinability of critical pairs in LCTRSs and reported 
experimental results. 

For plain rewrite systems many more confluence criteria are known and imple- 
mented in powerful tools that compete in the yearly Confluence Competition 
(CoCo).° In the near future we will investigate which of these can be lifted to 
LCTRSs. We will also advance the creation of a competition category on con- 
fluence of LCTRSs in CoCo. 

Our tool crest has currently no support for termination. Implementing ter- 
mination techniques in crest is of clear interest. The starting point here are the 
methods reported in [6,7, 12]. Many LCTRSs coming from applications are actu- 
ally non-confluent.® So developing more powerful techniques for LCTRSs is on 
our agenda as well. 


Acknowledgments. We thank Fabian Mitterwallner for valuable discussions on the 
presented topics and our Haskell implementation. The detailed comments by the 
reviewers improved the presentation. Cynthia Kop and Deivid Vale kindly provided 
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Abstract. Using Isabelle/HOL, we verify the state-of-the-art decision 
procedure for multi-level syllogistic with singleton (MLSS for short), 
which is a quantifier-free fragment of set theory. We formalise its syntax 
and semantics as well as a sound and complete tableau calculus for it. 
We also provide an executable specification of a decision procedure that 
exhaustively applies the rules of the calculus and prove its termination. 
Furthermore, we extend the calculus with a lightweight type system that 
paves the way for an integration of the procedure into Isabelle/HOL. 


Keywords: Decision procedures - Semantic tableaux - Interactive 
theorem proving - Set theory 


1 Introduction 


In Isabelle/HOL, there are specialised procedures for dealing with e.g. natural 
numbers, linear arithmetic, and metric spaces. Some of these procedures have 
been verified in Isabelle/HOL, such as a procedure for Presburger arithmetic [12] 
that was later extended to mixed real-integer arithmetic [11]. This procedure, 
though, uses reflection to work on goals in Isabelle/HOL, which, during execu- 
tion, either sacrifices speed by going through the simplifier or requires trusting 
the code generator. More recently, Stevens and Nipkow [25] presented a verified 
decision procedure for orders that produces certificates. This approach offers 
efficient execution by using generated code as well as soundness because the 
certificates are replayed through Isabelle’s inference kernel. 

This paper focuses on another ubiquitous structure in mathematics, namely 
sets. To the best of our knowledge, we present the first formally verified decision 
procedure for (a fragment of) set theory. In particular, we consider a quantifier- 
free fragment which Cantone and Zarba [9] call multi-level syllogistic with single- 
ton (MLSS). The fragment includes the usual set operations of union, intersec- 
tion, difference, membership, equality and, in addition, it allows the construction 
of singleton sets. 

Since MLSS admits a tableau calculus, generating certificates will be 
straightforward. Like with the aforementioned order solver, this paves the way 
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for an integration of the decision procedure into Isabelle, adding to its growing 
body of verified decision procedures. 


1.1 Contributions 


We present a formalisation in Isabelle/HOL of a tableau calculus for MLSS due 
to Cantone and Zarba [9] [7, Chapter 14]. We prove soundness and complete- 
ness of the calculus and give an abstract specification of a decision procedure 
that exhaustively applies the rules of the calculus. To obtain total correctness 
of the procedure, we prove its termination. Additionally, we naively refine the 
abstract to an executable specification from which we can generate code. The 
formalisation initially follows the paper but offers a more thorough account of 
some important details: 


— We deliver the omitted proof of Lemma 2 in the paper [9], a key building 
block for the completeness proof of the calculus. 

— The formal proof of completeness reveals that the calculus lacks a rule for 
eliminating double negation. 

— We derive an explicit upper bound for the number of formulas in a tableau 
branch. 


In the context of Isabelle/HOL, there is one crucial aspect that requires us 
to modify the calculus in the paper: the calculus works under the assumption 
that every variable is a set; however, this is not the case in Isabelle/HOL, e.g. 
consider the expression n € A where n is a natural number. We call these vari- 
ables urelements. To deal with them, we extend the calculus with a lightweight 
type system and a verified inference algorithm that identifies the urelements. 

The modification of the calculus required non-trivial changes to the complete- 
ness proof. Here, the formalisation was instrumental because Isabelle immedi- 
ately revealed which proofs had been broken. This illustrates the usefulness of 
ITPs for developing logic calculi: they allow us to confidently make modifications 
without compromising correctness. 

All in all, the formalisation amounts to over 6000 lines of theory. It is part of 
the Archive of Formal Proofs (AFP) [24]. The entry provides an overview theory 
MLSS_Proc_All.thy that highlights the (mostly syntactic) differences between 
paper and formalisation and references the constants and theorems that are 
introduced in this paper. 


1.2 Related Work 


Since the literature on decidable fragments of set theory is vast, we only focus on 
MLSS here. Ferro et al. [14] were the first to show the decidability of the frag- 
ment. Subsequent work [6] found the decision problem to be NP-complete. To 
obtain a practical decision procedure, Cantone [4] proposed a tableau calculus, 
which was later improved by Beckert and Hartmer [1]. Both of these procedures 
construct a model during execution that guides the proof search. Beckert and 
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Hartmer also cover an extension of the calculus with uninterpreted functions, 
which Cantone and Zarba [10] later revisited while avoiding the construction 
of a model during execution. In this paper, we consider a version of the latter 
procedure due to Cantone and Zarba [9] that is specialised to MLSS and where 
the branching rules of the calculus are set up to guarantee the mutual exclusivity 
of the branches. Later extensions of the calculus added certain interpreted func- 
tions, such as monotone functions [8] and the inverse of a function [5]. The latter 
extension notably includes the Cartesian product. Those extensions, though, did 
not improve upon the tableau calculus for MLSS. 

There is a large body of work at the intersection of ITPs and tableau methods, 
but to keep with this paper’s theme we only consider formalisations of correctness 
here. For first-order logic, there are abstract completeness proofs using the Beth- 
Hintikka style of possibly infinite derivation trees [3] as well as the Henkin style 
of maximally consistent sets [17]. Both are abstract enough to be instantiated 
with a wide range of concrete calculi. A more concrete formalisation [19] verifies a 
sequent calculus for first-order logic whose completeness proof is via a translation 
to semantic tableau. 

Beyond completeness, we target decidability, which is more attainable for 
propositional logic. There is a verified tableau calculus for the modal logic S5 [2] 
in Lean and one for hybrid logic [18] in Isabelle/HOL. Both of these do not prove 
termination but there is a formalisation of a tableau calculus for the temporal 
logic CTL in Coq [13] that does. 


1.3 Notation 


Isabelle/HOL [21] conforms to everyday mathematical notation for the most 
part. We establish notation and in particular some essential data types together 
with their primitive operations that are specific to Isabelle/HOL. 

We write t :: ’a to specify that the term t has the type ’a and ’a => ’b 
for the space of total functions from type ’a to type ’b. 

Sets with elements of type ’a have the type ’a set. The cardinality of a set 
A is denoted by |A| and the image of A under f by f ‘ A. 

We use ’a list to describe the type of lists, which are constructed using 
the empty list [] constructor or the infix cons constructor #, and are appended 
with the infix operator @. The function set converts a list into a set. 

We remark that —— is equivalent to = on the type of Booleans bool and 
= is definitional equality of the meta-logic of Isabelle/HOL, which is called 
Isabelle/Pure. Meta-implication is denoted by => and a chain of implications 
Ay vee Ay C can be abbreviated by | A; ...;4y J] = C. 


2 Syntax and Semantics of MLSS 


2.1 Syntax 


At the heart of MLSS, we have the type of set terms, which is the disjoint 
union of the empty set and variables as well as the operations union, intersection, 
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difference, and the singleton set represented by the constructor Single. We keep 
the type of variables abstract by making it a parameter of the set term data 
type. The only restriction on the type of variables is that it needs to be infinite. 
Isabelle/HOL’s data type package automatically defines a function that gives us 
the set of variables in a set term, which we name vars. In what follows, we will 
overload the function vars to also work on set atoms, formulas, and branches. 


datatype (vars: ’a) pset_term = 
Ø | Var ’a | Single (’a pset_term) 
| ’a pset_term LI, ’a pset_term 
| ’a pset_term s ’a pset_term 
| ’a pset_term —, ’a pset_term 


We can combine two set terms to form a set atom by using the membership or 
the equality operator. 


datatype (vars: ’a) pset_atom = 
7a pset_term Es ’a pset_term 
| ’a pset_term =, ’a pset_term 


With the above operators we can also represent the subset operator C, and 
enumerate finite sets: s Cs t is equivalent tos Us t =, t and a finite set of 
elements {t1,...,t,} can be expressed by Single tı Us ... Us Single tx. 

We use the propositional fragment of formulas due to Nipkow [20] with set 
atoms as propositional atoms to form the quantifier-free fragment MLSS of set 
theory. 


datatype (atoms: ’a) fm 
A’a 
| 4 Ca fm) 
| >a fm A ’a fm 
| >a fm V ’a fm 


type_synonym ’a pset_fm = ’a pset_atom fm 


We will often drop the atom constructor A to reduce clutter. Additionally, we use 
s ¢, tands #, t todenote AA (s €s t) and AA (s =, t), respectively. 

Similarly to vars, we get the function atoms :: ’a fm = ’a set for free 
that retrieves all set atoms in a formula. We combine these functions to extract 
all the variables occurring in a set formula. 


definition vars ¢ = U(vars ‘ atoms ¢) 


Likewise, we fix the constant subterms :: ’b => ’a pset_term set that is 
polymorphic in its argument type ’b. We overload this constant to return the set 
terms that are subterms of a set term, set atom, or formula, respectively. Lastly, 
we introduce the function subfms :: ’a fm > ’a fm set that computes the 
subformulas of a formula. The functions subterms and subfms are implemented 
in the expected way. 
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2.2 Semantics 


The original paper [9] bases the semantics of MLSS on the von Neumann hier- 
archy of sets V. We instead use the hierarchy of hereditarily finite sets (HF sets) 
which fulfil all the same axioms as V — that is, the axioms of ZF — except for the 
axiom of infinity. In particular, the membership relation is well-founded. The HF 
sets, as we will see, are sufficient to construct a model for any satisfiable MLSS 
formula. In contrast to V, the HF sets are directly representable in Isabelle/HOL, 
and indeed, an AFP entry [23] formalises them. The entry defines a type hf that 
comes with the following functionality: 


— The function HF :: hf set => set that converts a finite set of HF sets into 
an HF set. 

— The usual set operations such as equality (=), membership (€), union (U), 
intersection (M), and difference (—) are defined. 

— Finally, the empty set coincides with the ordinal 0, so it is denoted by 
O :: hf. 


Equipped with the above, we define the interpretation functions 


— Is :: Ca => hf) > ’a pset_term => hf and 
- Isa :: Ca => hf) => ’a pset_atom => hf 


in the standard way, i.e. by mapping each syntactic construct to the correspond- 
ing operation on HF sets and interpreting variables with respect to a given 
valuation function M :: ’a = hf. For the concrete definition we refer to the 
formalisation. 

We write M = ¢ for the judgement that the formula ¢ holds under the val- 
uation function M. The implementation of = coincides with the interpretation 
function of Nipkow [20]. As usual, we call a formula ¢ satisfiable if there exists 
a model M with M = @. Otherwise, we say that ¢ is unsatisfiable. 


3 A Tableau Calculus for MLSS 


We formalise the tableau calculus for MLSS as described by Cantone and 
Zarba [9]. Inspired by the formalisation of a tableau calculus for hybrid logic 
by From [16], we use lists to represent the branches of the tableau tree. Note 
that we add formulas to the front of the list during branch expansion, so last b 
for a branch b is always the formula we are trying to disprove with the tableau. 
We sometimes call this formula the initial formula. 


type_synonym ’a branch = ’a pset_fm list 


We lift the functions vars and subterms to branches in the expected way. 

In the standard tableau calculus for propositional logic as Fitting [15] 
describes it, a branch is called closed if it contains both the negation of a formula 
and the formula itself; conversely, it is called open if it is not closed. For MLSS, 
we extend the notion of closedness with three additional rules; the first two are 
straightforward while the last one states that a branch is closed when the branch 
contains a membership cycle to Es t1, ti Es to, ..., ty Es to. 
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Table 1. Linear expansion rules. All rules except the double negation rule coincide 
with the original paper [9]. For brevity, we omit the rules for Ms and —s. 


Propositional Rules Rules for Us 
PAd = Pq s ds ti Us te => s s ti, s És te 
PVA = mpa S €s ti = s €, ti Us te 
Py ae Ip =q 8 Es ta = s €s ti Us te 
PV dee. Soe e S €s ti Us tz, => 8 €s t2 
=(P ^q, p = nq s És tı 
=p Aqa); 4 = 7p s €s ti Us ta, = sé. tı 
a (Ap) = p s fs te 
s És tı, S És t2 => 8 És ti Us to 
Rules for Ms Rules for —, 
Rules for Single Rules for =; 
=> s E, Single s ti =, ta; 1 = Ht2/ti} 
s €, Single t = s =, t ti =s t2, 1l = > l1{t:/te} 
s ¢, Singlet — s És t Sı €s t, S2 És t = > Sı És S2 
inductive bclosed :: ’a branch = bool where 


| ġ € set b; ngo E set b ] = > bclosed b 
| (t Es 0) € set b => bclosed b 
| (t #4, t) € set b = > bclosed b 
| | member_cycle cs; set cs C set b ] = > bclosed b 


abbreviation bopen b = ~ bclosed b 


A tableau is called closed if all of its branches are closed. 


3.1 Linear Expansion Rules 


The calculus considers two kinds of branch expansion rules: linear and branch- 
ing rules. As the name suggests, branching rules lead to the creation of new 
branches in the tableau while linear rules only extend a branch b with new for- 
mulas b’? = [wW1,...,%n], which we denote by b’ > b. Table 1 shows the linear 
expansion rules. Note that in the first two rules for =s, 1 is a literal occurring 
in the branch. Furthermore, the term-for-term substitution 1{s/t} is restricted 
to the top-level set terms of 1, i.e. the set terms that occur directly under one 
of the atom constructors €, or =,; for example, given the literal 


l= 7A4((s Us u) —s s =, s Us u) 


we have 
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Table 2. Branching expansion rules. We write ¢ for last b here. All rules coincide 
with the original paper [9] so we only show an illustrative subset. 


Rule Precondition Subsumption condition 
p € set b V 
PVqée set b 
plap ~ pe€ set b 
(s Es ti Us t2) € set b (s €s ti) € set b 
s €s ti | s ¢e ti tı Us t2 € subterms ¢ V (s ¢s tı) € set b 


ds. (s Es ti) € set b 


(t t2) € set b 
E A (s ¢s t2) € set b 


ti subterms ¢ọ 
Var x €s tı | Var x ¢s tı t2 € subterms ¢ v 
Var x És to | Var x Es to ds. (s És ti) € set b 


A (s Es t2) € set b 


x ¢ vars b 


(= ((s Us u) —; s =, S Us u)){t/s Us u} 
= a~ ((s Us u) —s S =, t). 


A more crucial restriction of the linear rules is that no new subterm may be 
created by their application; for instance, the second rule for Us is 


s Es ti => 8 Es ti Us to, 
which formally represents 
(s €, ti) € set b = > [s €, ti Us to] > b, 


and may only be used under the condition tı Us tg € subterms (last b). 
The purpose of this restriction is to prevent unbounded expansion of the branch. 
In fact, we give an explicit upper bound for the number of formulas in a branch 
in Sect. 7. 

Due to boundedness, repeated expansion with linear rules eventually results 
in a linearly saturated branch, i.e. a branch where no application of linear rules 
would produce new formulas. 


definition lin_sat b = Vb’. b’ > b —> set b’ C set b 


Finally, we remark that the original paper [9] is missing the last propositional 
rule dealing with double negation. This rule is required for completeness, though, 
considering that the branch [n~n np, p, mmap A p] is saturated—neither 
linear nor branching rules apply—and open, but there clearly is no model for 
the initial formula nmap A p. 
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3.2 Branching Rules 


After running out of linear rules to apply, only the branching rules shown in 
Table2 remain. A rule is applicable if its precondition is met and, to prevent 
unnecessary branching, if it is not subsumed as indicated by the subsumption 
condition. These rules create multiple branches in the tableau, so we represent 
the different possibilities bs’ to expand a branch b as a set and write bs’ > b. 
Accordingly, we get a new branch b?’ @ b in the tableau for each b’? € bs’. 

A linearly saturated branch where no further branching is possible is called 
a saturated branch. 


definition sat b = lin_sat b A (fibs’. bs’ > b) 


Note that even branching rules are defined such that they never create new 
subterms, except for the last rule that adds a new variable to the branch. These 
variables serve to manifest an inequality; hence, we call them witnesses. 


definition wits b = vars b - vars (last b) 


4 A Decision Procedure for MLSS 


The mechanics of the decision procedure are typical for a procedure based on a 
tableau calculus: it decides the satisfiability of a given formula ¢ by determining 
whether the formula has a closed tableau. More specifically, it initialises the 
tableau with the singleton branch [¢] and checks whether this branch can be 
expanded to a closed tableau. 

We only discuss the abstract specification here and refer the reader to the 
formalisation for the executable specification. The implementation uses a couple 
of features of Isabelle/HOL’s function package: instead of defining the function 
via pattern matching, we specify the equations of the function as conditional 
rewrite rules. This requires us to prove that the assumptions of the equations 
are non-overlapping, which is done by automation. The other concern is that 
Isabelle/HOL requires functions to be total, so a recursive function needs to ter- 
minate for it to be well-defined; nevertheless, the termination proof is separated 
from the definition of the function for modularity. The function package main- 
tains the soundness of the definition by introducing a so-called domain predicate 
mlss_proc_branch_dom which characterises the arguments for which the func- 
tion terminates. Each equation of the function is guarded by an assumption that 
the predicate holds for the argument. In Sect.7, we will show that the domain 
predicate holds for the context in which the function mlss_proc_branch is called 
in. Before we go into more detail on how the termination is proved, we discuss 
the definition of the function, as shown below. 


function mlss_proc_branch :: ’a branch = bool where 
~ lin_sat b ==> mlss_proc_branch b = 
mlss_proc_branch ((SOME b’. b?’ > b A 
set b C set (b’ @ b)) @ b) 
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| [| lin_sat b; bclosed b | = > mlss_proc_branch b = True 

| | = sat b; bopen b; lin_sat b | = > mlss_proc_branch b = 
(Vb? € (SOME bs. bs > b). mlss_proc_branch (b’ @ b)) 

| [| lin_sat b; sat b | = > mlss_proc_branch b = bclosed b 


definition mlss_proc :: ’a pset_fm = bool where 
mlss_proc ¢ = mlss_proc_branch [¢] 


The purpose of the function is to determine whether we can expand a given 
branch to a closed tableau. As stated before, we first use linear expansion rules 
in order to prevent premature branching; to this end, we recursively expand the 
branch with linear rules until the branch is linearly saturated. Note that we 
use Hilbert’s -operator in the form of SOME! to choose some rule that actually 
adds new formulas to the branch. As soon as the branch is linearly saturated, 
we terminate if the branch is closed as the second equation shows. Otherwise, 
we choose an applicable branching rule and recursively check whether all newly 
created branches can be closed. The final equation applies once no further branch 
expansion is possible, in which case we just test for closedness of the branch. 

The procedure mlss_proc then calls mlss_proc_branch with a singleton 
branch [¢] to determine the satisfiability of a given formula ¢. 

Thus, we use mlss_proc_branch is only on branches that result from apply- 
ing the expansion rules. We call this kind of branch well-formed. In the definition 
below, the expression b’ >* b denotes that b’ is one of the branches that results 
from applying (potentially zero) expansion rules to b. 


definition wf_branch b = dd. b >* [¢] 


We use this notion in Sect. 7 to state an upper bound for the cardinality of 
well-formed branches. The upper bound justifies the termination of the decision 
procedure. Before we come to that, though, we prove soundness and completeness 
in Sect.6 and 5, respectively. In Sect. 7, we also show that both properties easily 
transfer to mlss_proc, which, together with termination, establishes that it is a 
decision procedure. 


5 Completeness of the Calculus 


For completeness of the calculus, we need to show that every unsatisfiable for- 
mula has a closed tableau or, conversely, that the formula is satisfiable if there 
is a saturated and open branch in the tableau. To facilitate inductive reasoning, 
we show a stronger statement by constructing a model M such that M = @¢ for 
all @ € set b. At the core of the model, there is a realisation function that 
maps set terms to sets of type hf. A subset of the witnesses, which we call pure 
witnesses, receives special treatment from the realisation function for reasons 
that will become apparent in Sect. 5.1. The collection of set terms of a branch 
can thus be partitioned into two collections, as defined below. 


1 In the formalisation, the function mlss_proc_branch is actually parametrised by 
choice functions to allow for refinement. 
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definition pwits :: ’a branch > ’a set where 
pwits b = {c € wits b. Vt € subterms (last b). 
AT (Var c =, t) ¢ set b A AT (t =, Var c) € set b} 


definition subterms’ :: ’a branch > ’a pset_term set where 
subterms’ b = subterms (last b) U Var ‘ (wits b - pwits b) 


We aim to construct a syntactic model that we derive from the membership 
literals s €, t in the branch. To this end, we construct a graph whose vertices 
are the disjoint union of the sets above and there is an edge from s to t in 
the graph if, and only if, s €s t is in b. Note that we use Noschinski’s graph 
library [22] which represents a graph as a record of vertices, arcs (directed edges), 
and two functions tail and head that map an arc to its source and target vertex, 
respectively. 


¢ 


definition bgraph b = let vs = Var ‘ pwits b U subterms’ b 
in ( verts = vs, arcs = {(s, t). (s €s t) € set b}, 
tail = fst, head = snd ) 


The realisation function is defined relative to this graph. As mentioned before, 
the realisation function treats the pure witnesses differently than the rest of the 
set terms. The function evaluates terms in the latter set in accordance to the 
structure of the graph, i.e. the realisation of a vertex is defined as the union of 
the realisations of the parent vertices. For the former set, we choose a function 
I that assigns the pure witnesses pairwise distinct sets with cardinality greater 
than that of the vertices. We can always choose such a function since we assume 
an infinite universe of variables. Then, we return the singleton set HF {I x}, 
which, together with the cardinality constraint, guarantees that realisations are 
distinct between pure witnesses themselves as well as between pure witnesses 
and set terms. The notation u —¢, s in the definition below indicates that there 
is an edge from u to s in the graph G. 


abbreviation parents Gs = {u. u —¢ s} 


function realise :: ’a pset_term = V where 

x € Var ‘ pwits b => realise x = HF {I x} 
| x € subterms’ b 

=> realise t = HF {realise ‘ parents (bgraph b) s} 
| x ¢ verts G => realise x = 0 


Again, we need to ensure that the assumptions of the equations are non- 
overlapping and that the function terminates. The former is taken care of by 
automation, leaving us to prove termination. The assumption that b is open 
implies that there are no membership cycles, thus bgraph b is acyclic. Further- 
more, the graph is finite by definition. Thus, we can use the cardinality of the 
set of ancestors as a measure that decreases in each recursive call. 

Before we prove that the realisation function constitutes a model in Sect. 5.2, 
we will first explain the significance of the pure witnesses. 
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5.1 Characterisation of the Pure Witnesses 


Recall that the pure witnesses of a branch b are those witnesses that are not 
related to other subterms in last b by equality. In the context of a well-formed 
branch, we can strengthen this characterisation to any set term and, in addition, 
we also get that there is no membership literal where a pure witness is on the 
right-hand side. Intuitively speaking, the realisation of a pure witness does not 
depend on the realisation of any other set term. 


lemma lemma_2: 
assumes wf_branch b and c € pwits b 
shows (Var c =, t) ¢ set b and (t =, Var c) ¢ set b 
and (t €s Var c) ¢ set b 


So why are pure witnesses treated differently? According to the definition of 
realise, it would evaluate the pure witnesses would to the empty set 0 :: hf, 
were they not treated separately. To see that this is a problem, consider the 
branch b = [Var s #4, Var t, Var t #, Var u] which expands to several 
open and saturated branches, one of which is 


[Var x se Var y, Var x €, Var s, Var x és Var t, 
Var y €, Var t, Var y ¢, Var u] @ b 


for some fresh x and y. Assigning both Var x and Var y a value of 0 would con- 
tradict the literal Var x #, Var y. To prevent this, we assign the pure witnesses 
pairwise different values. 

The proof of lemma_2 is more technical than interesting so we refer the reader 
to the formalisation. 


5.2 Realisation of an Open Branch 


Remember that for completeness, we need to show that the realisation function 
for an open and saturated branch b actually constitutes a model for all formulas 
in the branch. We start by verifying that the realisation function models all 
literals in the branch; more formally, the following propositions hold: 


€ realise t if it holds that s €, t is in b. 
= realise t ifs =, t isin b. 
# realise tifs #, tisinb. 
¢ realise t ifit holds that s ¢, tis in b. 


(1) We have realise 
(2) We have realise 
(3) We have realise 
(4) We have realise 


nanana 


To illustrate the usefulness of lemma_2, we prove Proposition (2). The proofs of 
all propositions translate well into Isabelle, so we refer to the original paper [9] 
for the remaining proofs. 


Proof. (Proof of Proposition (2)). Assume that s =, t is in b. If there exists 
ac € pwits b where s = Var c or t = Var c, we arrive at a contradiction 
due to lemma_2. Therefore, both s € subterms’ bandt € subterms’ b must 
hold. Now, assume for contradiction that realise s # realise t. Without 
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loss of generality—the other case is symmetric—we obtain an e such that 
e € realise sande ¢ realise t. Considering that s € subterms’ b and 
the definition of realise, we obtain a d withe = realise dandd —pgrapn b S. 
This, in turn, yields that d €s s must be in b. Together with the assumption 
(s =, t) € set band the saturation of b, it follows that d €, t must also be 
in b. But then we have realise d € realise t «— e € realise t using 
Proposition (1), which is a contradiction to the assumption e ¢ realise t. 


We now lower the results on literals to set terms. All of the proofs are straight- 
forward so we refer the reader to the formalisation. 


(a) It holds that realise Ø = 0. 
(b) Let xs € {Us, —s, Ms}. If the terms x, t occurs in subterms b, then 


realise (s x; t) = realise s x realise t. 
(c) If Single t € subterms b, then 
realise (Single t) = HF {realise t}. 


The final step for obtaining a proper model is to connect the realisation 
function to the semantics as defined in Sect. 2. For set terms, we can use the 
Propositions (a)—(c) to prove the lemma below by induction on t. 


lemma assumes t € subterms b 
shows Iz, (Ax. realise (Var x)) t = realise t 


Lifting the above result to formulas yields the coherence of b, as the original 
paper [9] calls it. The proof is a tedious but straightforward induction on the 
the size of the formulas. 


lemma coherence: 
assumes @ € set b shows (Ax. realise (Var x)) = @ 


The coherence property finishes the proof of completeness of the calculus as it 
gives us a model for every formula in an open and saturated branch. 


6 Soundness of the Calculus 


A tableau calculus is sound if the corresponding formula is unsatisfiable for any 
closed tableau. We prove the following two properties to establish soundness: 


(1) It is impossible to satisfy all formulas in a closed branch simultaneously. 
(2) The expansion rules maintain satisfiability. 


We formalise the first property in Isabelle below. 


lemma bclosed_sound: 
assumes bclosed b shows Jọ € set b. MF ọ 
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Proof. It is clear that, for any s, neither does M model s € Ø nor s Æ, s. Fur- 
thermore, no model can satisfy both ¢ and 4¢@ at the same time. Lastly, a mem- 
bership cycle is impossible since the membership relation of hf is well-founded. 


We are left with showing that both linear and branching expansion rules preserve 
satisfiability. As for the linear rules, a straightforward proof by case analysis on 
b?’ > b suffices to obtain the lemma below. 


lemma lexpands_sound: 
assumes b’ > b and @ € set b? and Aw. Y E set b = ME wy 
shows ME @ 


A similar argument would work for the branching rules if it were not for the last 
rule adding new variables. Those variables need to be assigned specific values; 
hence, we modify the model as shown in the proof below. 


lemma bexpands_sound: 
assumes bs’ > b and Aw. Y E set b= > ME wy 
shows 4M’. db’ € bs’. Vy E set (b? @ b). MW EW 


Proof. We only consider the case where bs’ > b was proved by applying the 
last branching expansion rule to s #, t for some s and t. We have 


bs’ = {[Var x €, s, Var x ¢, t], [Var x €, t, Var x ¢, s]} 


for some fresh variable x. Since s #, t isin b, we have that Is Ms Æ Ig, Mt 
because M is a model. Without loss of generality, this inequality manifests itself 
through some y with y € Ig, M sandy É Is, M t. We update M to map x to 
y to obtain the assignment M’. Note that M’ is still a model for formulas in b 
because x is fresh with respect to b. Furthermore, it is also a model for the first 
branch in bs’, which finishes the proof. 


7 Total Correctness of the Decision Procedure 


We first demonstrate the termination of the procedure for well-formed branches, 
i.e. every well-formed branch is in the domain of mlss_proc_branch. To this end, 
we derive an upper bound for the number of distinct formulas in a branch whose 
proof we omit here for brevity. We should point out that this bound is not to 
be construed as the complexity of the procedure as it may create exponentially 
many branches in general. 


lemma card_wf_branch_ub: 
assumes wf_branch b 
shows |set b| < 2*|subfms (last b)| + 16*|subterms (last b) |* 


Remember that mlss_proc_branch only applies a linear expansion rule to a 
branch if the application results in new formulas. Moreover, the subsumption 
conditions of the branching expansion rules ensure that each of the newly created 
branches contain new formulas. Ultimately, we conclude that the procedure must 
terminate for well-formed branches because the number of formulas increases in 
each step but is also bounded. 
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lemma assumes wf_branch b shows mlss_proc_branch_dom b 


The above lemma allows us to utilise the computation induction rule of 
mlss_proc_branch on well-formed branches, which we use to prove soundness 
and completeness. As both proofs are essentially an application of soundness, 
respectively completeness, of the calculus, we refer the reader to the formalisa- 
tion. 


lemma mlss_proc_branch_complete: 


fixes b :: ’a branch 
assumes wf_branch b and ~ mlss_proc_branch b 
assumes infinite (UNIV :: ’a set) 


shows JM. M = last b 


lemma mlss_proc_branch_sound: 
assumes wf_branch b and Vw € set b. ME w 
shows ~ mlss_proc_branch b 


To finish the proof of total correctness, note that every singleton branch 
is trivially well-formed; thus, termination, completeness, and soundness easily 
transfer to mlss_proc. 


theorem mlss_proc_complete: 
fixes @ :: ’a pset_fm 
assumes ~ mlss_proc @ and infinite (UNIV :: ’a set) 
shows IM. ME @ 


theorem mlss_proc_sound: 
assumes M = @ shows ~ mlss_proc ¢ 


8 Dealing with Urelements 


In the introduction, we stated the goal of integrating mlss_proc as a tactic 
into Isabelle. For this to work, we must map every branch expansion rule to a 
corresponding theorem in Isabelle/HOL. This is straightforward for all expansion 
rules except for the last branching expansion rule. To illustrate, suppose that we 
are to disprove a statement of the form 


sA(t:: ’a) As € (A :: ’a set) UBA... 
in Isabelle/HOL. By way of reification, we convert this to a formula of the shape 
s? #, t? As’? & A’? Us BA... 


in our set syntax for some s’, t’, A’, and B’. When we apply the decision 
procedure to this formula, it might return a tableau proof that contains an 
application of the last branching rule to (s? Æ, t?) € set b. This results 
in two branches, one of which is [Var x €s s’, Var x és t’] @ b; however, 
there is no matching rule in Isabelle/HOL since s and t are not sets. 
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To deal with this problem, we formalise a lightweight type system as displayed 
in Fig. 1. The type of a set term in this system is just a natural number which 
we call level. Intuitively speaking, the level 1 means that the corresponding term 
t in Isabelle/HOL has type 


’a set... set 
—~_—_--— 


1 times 


for some ’a. Note that the constructor @ now receives an additional argument 
indicating the level of each instance of 9. 

Moreover, the typing judgement extends to set atoms by matching up the 
levels of its component set terms. 

Ultimately, we define [ F @ = Va € atoms ¢. I’ F a in order to type 
formulas. 

We can now define the urelements with respect to a formula. An urelement 
is a set term whose corresponding type in Isabelle/HOL might not be a set. 


definition urelem :: ’a pset_fm = ’a pset_term = bool where 
urelem ọġ t = IJ. TrF@ATEFt: O 


Using this definition, we make two changes to the specification of the calculus: 
(1) First and foremost, we require that neither s nor t is an urelement in the 
precondition of the last branching expansion rule. (2) As mentioned above, we 
add an argument to the Ø constructor. This argument is only used for the typing 
judgement; it has no impact on the semantics. 

Soundness, of course, is not affected by these changes but we have to make 
a few amendments to maintain completeness: (1) The first equation of realise 
now also must account for the urelements. In particular, it has to ensure that ure- 
lements receive pairwise different values unless they are related through equality 
atoms. This does not affect pure witnesses since they can not be related through 
equality atoms due to lemma_2. (2) We must adjust the completeness proof in 
those places where it directly refers to the definition of realise to account for 
the case where a given term is an urelement. (3) The completeness theorem 
receives the additional assumption that I’ + @ holds for the initial formula ¢. 


Free 
rH Ón: Sucn Ir F var x: Ix I’ F Single t : Suc 1 


x, € {Us,Ms,-s} FFs:1 Frt:1 140 


Ir sx*5 til 


Frs:1 FRt:1 Frs:1 FRt: Suc l 
r ES S=; t Frs€,t 


Fig. 1. The type system for set terms and atoms. 
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(4) For the completeness proof, we must show that the typing judgement is 
invariant under branch expansion. 

The modifications above ensure that the proof can be replayed through 
Isabelle/HOL. To actually use the calculus, we must determine the urelements of 
the initial formula ¢, though. In other words, we have to implement an inference 
algorithm for our lightweight type system. The algorithm is, in essence, a sim- 
plified version of Hindley-Milner type inference so it has the same two phases: 
it generates constraints using syntax directed rules and then passes them to a 
constraint solver. 

Since we are only interested in the level of a term, we can encode all con- 
straints into the theory of 0, the successor function S, and equality (but no dis- 
equality). Note that constraints of the form 1 4 0 can be replaced by 1 = S i 
with i being a fresh variable. A solver for this theory is straightforward to 
implement and verify; nevertheless, we have to be careful that it computes the 
minimum assignment I" from variables to levels that fulfils the constraints. This 
guarantees that a set term t is not an urelement if, and only if, [ t > 0. 
Conversely, all terms s with  s = 0 are urelements. 


9 Conclusion and Future Work 


We developed a formalisation of a tableau calculus for a quantifier-free fragment 
of set theory called MLSS based on a paper by Cantone and Zarba [9]. The for- 
malisation includes an abstract description of a decision procedure that builds 
on the calculus. To make the decision procedure compatible with Isabelle/HOL, 
we extended the calculus with a lightweight type system while maintaining com- 
pleteness. We also refined the abstract specification to an executable specification 
from which code can be generated. 

In future work, we plan to implement an efficient executable specification in 
the style of a worklist algorithm. This specification should also generate certifi- 
cates that can be replayed through Isabelle’s inference kernel to facilitate the 
integration of the procedure into Isabelle. 


Acknowledgements. The author thanks Kevin Kappelmann and Tobias Nipkow for 
their comments on a draft version of this paper and the anonymous referees for their 
thorough reviews. 
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Abstract. We describe an experimental implementation of a logic-based 
end-to-end pipeline of performing inference and giving explained answers 
to questions posed in natural language. The main components of the 
pipeline are semantic parsing, integration with large knowledge bases, 
automated reasoning using extended first order logic, and finally the 
translation of proofs back to natural language. While able to answer 
relatively simple questions on its own, the implementation is targeting 
research into building hybrid neurosymbolic systems for gaining trust- 
worthiness and explainability. The end goal is to combine machine learn- 
ing and large language models with the components of the implementa- 
tion and to use the automated reasoner as an interface between natural 
language and external tools like database systems and scientific calcula- 
tions. 


1 Introduction 


Question answering and inference using natural language is a classic A.I. area, 
with a long history of little success using symbolic methods, able to solve only 
small problems with a limited structure. The recent machine learning (ML) 
systems, in particular, the Large Language Model (LLM) implementations of 
the BERT and GPT families are, in contrast, often able to give satisfactory 
answers to nontrivial questions. 

However, the current LLMs are neither trustworthy nor explainable. They 
have a well-known tendency of “hallucinating”, i.e. giving wrong answers and 
inventing actually nonexistent entities and facts. The problems of explicitly con- 
trolling the output and giving explanations for the solutions appear to be very 
hard for LLMs. An optimistic view of LLMs suggests that end-to-end learning 
can be improved to overcome these issues, while a more pessimistic view sug- 
gests that the problems are inherent and stem from the lack of an internal world 
model. The proponents of the latter view propose to build hybrid neurosymbolic 
systems, combining machine learning and symbolic methods of various kinds. 
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Indeed, the research in the field of neurosymbolic systems has become quite 
active. The recent survey [14] points to a wider interest in connecting natural 
language systems to external software like databases and scientific calculations. 

Using logic for natural language inference (NLI) in combination with ML 
may potentially alleviate the problems with LLMs and provide a glue to connect 
external systems to natural language interfaces. However, using logic directly for 
processing natural language is hard, for a number of reasons: 


— Semantic parsing, i.e. translating natural language to logic, is extremely hard 
due to the highly complex and exception-rich nature of natural language. 

— Existing knowledge bases of “common sense” do not cover a critical mass of 
the basic understanding of the world even a small child possesses. 

— Classical first order reasoning itself cannot cope with contradictory knowledge 
items, probabilistic or uncertain information and exceptions to rules. 

— Finding logic-based proofs often requires long proofs and the huge knowledge 
base causes a quick combinatorial explosion of the search space. 


The motivation behind the research described in the paper is the following 
hypothesis: all the main problems described above can be alleviated by using ML 
techniques tailored separately for each particular problem. The current paper 
does not introduce any ML techniques for the problems above. The goal of our 
system is to serve as a backbone for research into combining the symbolic meth- 
ods with ML. Our hypothesis is that by gradual improvement and combination 
of the existing symbolic subsystems with ML techniques it is possible to eventu- 
ally build a question answering system which has enough power, trustworthiness 
and explainability to be practically useful in various application areas. 

In other words, the envisioned end goal of this research is neither to replace 
LLMs nor to verify their output, but to develop systems combining LLMs and 
symbolic reasoning for specific areas where it is feasible to build sets of domain- 
specific rules and factual databases. 


2 Related Work 


Here we will only consider projects building a full NLP inference system. The 
performance of older pure symbolic or logic-based methods like LogAnswer [7] 
remained at the level of specific toy examples and never achieved capabilities 
required for wider applicability. The long-running CYC project [22], although 
having several successes, did not succeed with its original stated goals, which is 
often used as an argument against symbolic systems. 

A popular area for language processing is converting human queries to SQL 
or SPARQL queries. These systems typically do not handle rules expressed in 
natural language. The projects closest to ours use reasoners with a relatively lim- 
ited capacity, like BRAID [12], which uses extended SLD+ reasoner with prob- 
abilistic rules and fuzzy unification, CASPR [18], which uses an ASP reasoner 
incorporating default logic, NatPro [1,2], which uses a Natural Logic prover. The 
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latter is the only such project we know to be publicly available: https://github. 
com/kovvalsky/prove-SICK-NL. 

The majority of research in neurosymbolic reasoning for natural language 
combines ML with weak forms of symbolic systems, typically taxonomies and 
triple graph knowledge bases like ConceptNet [25]. We approach the problem 
from the less common direction: starting from the symbolic/reasoning side and 
moving towards ML. There are already a few research projects combining ML 
with reasoning in quantified first order logic, although we are not aware of any 
such systems being publicly available. Noteworthy projects involving quantified 
logic are SQuARE [4], BRAID [12] and STAR [21]. The recent work in using large 
language models (LLM) mapping informal proofs to formal Isabelle [17] proof 
sketches guiding an automated prover [34] and using LLMs directly to generate 
Isabelle code [11] shows clear promise in combining LLMs with provers. 


3 Natural Language Inference and Question Answering 


The described pipeline is able to handle both the natural language inference 
(NLI) tasks (given a premise, determine whether a given hypothesis is true, false 
or indeterminate) and the closely related question answering tasks of finding a 
specific object matching a given criterion. 

We will use a few simple examples throughout the paper. The expected 
answer to the first example “If an animal likes honey, then it is probably a bear. 
Most bears are big, although young bears are not big. John is an animal who 
likes honey. Mike is a young bear. Who is big?” is “Likely John”. The expected 
answer to the second example “The length of the red car is 4m. The length of 
the black car is 5m. The length of the red car is less than 5m?” is “True”. 

It is worth noting that these examples are solved correctly by the current 
(May 2023) versions of GPT: ChatGPT using the text-davinci-002 model and 
the API using the gpt-3.5-turbo and gpt-4 models: moreover, they are able to 
give a satisfactory explanation of the reasoning behind the answers. However, if 
we insert additional irrelevant information to the first example, our system still 
finds the expected answer, while none of the GPT models above give a correct 
answer: “If an animal likes honey, then it is probably a bear. Most bears are big, 
although young bears are not big. John is an animal who likes honey. Mike is 
a young bear. Mike can eat a lot. Penguins are birds who cannot fly. John took 
the block from the colored table. The table was really nice. The robot arm lifted 
a blue block from the table. Who is big?”. 

Similarly, when we modify the second example by using meaningless words 
and adding irrelevant text, our system finds the expected answer, while all 
the referred GPT models give confusing answers: “The length of the barner 
is 200000000 m. The length of the red foozer is 3812435m. Most barners are 
1000000 m long. Sun is larger than the moon. John saw the sun rising over an 
enormous foozer. A huge robot filled the sky. The length of the red foozer is less 
than 312546 m?” However, the answers given by GPT versions may vary over 
time, i.e. experiments with GPT are not reproducible. 
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4 The Question Answering Pipeline 


Our system is publicly available at http://github.com/tammet/nlpsolver. It 
requires Linux and should be easy to install. The implementation consists of 
four main software systems. The pipeline driver calls the external Stanza parser 
[20] from Stanford, giving a Universal Dependencies (UD, see [5]) graph, then 
runs the semantic parser on the UD graph, calls the reasoner, and finally builds a 
natural language answer along with the explanation built from the proofs given 
by the reasoner. The pipeline driver, parser and answer construction compo- 
nents consist of over 400 Kbytes of Python code. Before running the solver, a 
small Python server component has to be started, to initialize the external UD 
parser Stanza and read a commonsense knowledge base into shared memory. For 
reasoning the pipeline calls our commonsense reasoner GK, written in C: this 
is the largest and the most complex part of the pipeline. There is a separate 
Python program for regression tests, along with several Python files containing 
sub-tests, currently over 1600 separate NLI tasks. The pipeline driver is called 
from a command line, with a natural language text and question as a command 
line argument, plus a number of optional arguments to control the behaviors like 
the amount of output. 


4.1 Semantic Parsing 


The parser takes English strings of natural language text as input and outputs 
extended clausified first-order logic formulas encoded in JSON as proposed in 
JSON-LD-LOGIC [29]. The main extension is adding numerical confidence to 
clauses and implementing default logic [23] by including special literals to encode 
exceptions, as presented in our papers [28] and [27]. 

Parsing consists of a number of phases, each adding new structural details 
to the results of the previous phases. For the most part, the phases are imple- 
mented procedurally, without using explicit transformation rules: we found that 
the more complex aspects of translation cannot be easily expressed with the 
help of simple transformation rules. In particular, the correct interpretation of 
a sentence depends heavily on previous sentences and a collected database of 
objects which have been talked about. 


Conversion to Universal Dependencies (UD) Format. We use the exter- 
nal Stanza parser to get the UD format dependency graphs from input sentences. 
Stanza itself uses pretrained neural models. We first preprocess English strings 
to avoid several typical mistakes of the Stanza conversion, and then use Stanza 
to get the UD graph. The graph is then fed to our small set of simplifying trans- 
formations returning a simplified text, which is again fed to Stanza to get the 
final UD graph. The simplification phase reduces the amount of complexities 
and edge case handling necessary in the UD-to-logic converter, and is a prime 
candidate for experimenting with using LLMs for simplifications. 
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Converting UD to Logic. One of the strengths of UD representation given 
by Stanza is a high level of detail. The first subphase of conversion is restruc- 
turing the UD graph to a semi-logical representation explicating the outward 
logical structure around the subject/verb, object/verb or subject /verb/object 
tuples. The following subphases attach different kinds of properties to words. 
For example, the outmost structure constructed for the sentence “Most bears 
are big, although young bears are not big.” is 

[and, svo[bear,be,big], svo[bear,be,big]] which is then extended to 
[and, svo[bear,be,big], svo[[props, young, bear] ,be,big]]. 

The words in these structures are key-value objects containing both the initial 
UD information and additional details added during the phases. 

The next subphase results in the extended logic in a non-clausified form, i.e. 
using explicit quantifiers. The conversion uses the previous structure recursively, 
taking into account the details of the original UD structure to find additional 
critical information like articles, negation, different kinds of quantifiers etc. We 
follow the approach of Davidsonian semantics, introducing event identification 
variables, while not taking the neo-Davidsonian path of splitting all relations to 
their minimal components (see [33]) 

For the coreference resolution we calculate the weighted heuristic scores for all 
candidate words, using also taxonomies of Wordnet. Another inherently complex 
task is determining whether a noun stands for a concrete object or should be 
quantified over. Importantly, any object detected is stored in a special data 
structure with new information about the object possibly added as the parsing 
process proceeds. 

Let us consider an example sentence “John is a nice animal who likes honey.” 
It would be first converted to a conjunction of three formulas 


isa(animal,c1_John) 

prop(nice,c1_John, generic, generic, ctxt(Pres, 1)) 
def0(c1_John) 

VS (def0(c1_John) 

AX isa(honey, X) & (JA do2(like, c1_John, X, A, ctxt(Pres,S)))) 


The system determined that in this sentence “John” refers to a concrete object 
and immediately created a Skolem constant c1_John, storing it for possible later 
use and extension. Here it also created a new definition def0 for encoding the 
complex property of “John”: liking honey. The properties of objects like given in 
the second formula above also encode the intensity of the property (slightly /very) 
and the comparative class: for example, saying “John is a very large animal 
..” would create prop(large, c1_John, 3, animal, ctxt(Pres, 1)). The constant 
generic indicates that intensity is not known or that the property is not com- 
parative, i.e. does not relate to a specific class. The term ctxt(Pres, 1) encodes 
contextual aspects: the present tense and a concrete situation number in a possi- 
ble sequence of situations created by different actions. The variable A in the last 
formula is an identifier of an action, which can be given additional properties, 
like place, time or assistive objects of an action, in the Davidsonian style. 
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In the representations above we have omitted the information about con- 
fidence and the possibility of exceptions. Indeed, the sentence we looked at is 
considered to be certain and without exceptions. However, the first part of the 
sentence “Most bears are big, although young bears are not big” attaches con- 
fidence 0.85 to the formula and includes a blocker literal encoding an exception 
in the sense of default logic, along with the comparative priority of the blocker: 


0.85 : VX isa(bear, X) > 
(prop(big, X, generic, generic, ctxt(Pres, 1)))V 
block(h(bear, 1), neg(prop(big, X, generic, generic, ctxt(Pres, 1)))) 


The blocker literals are used by the GK prover to recursively check the proof 
candidates found, with dimishing time limits: GK uses a part of a given time 
limit to attempt to prove each blocker literal in the proof. Whenever a blocker 
is proved, the candidate proof containing the blocker is considered invalid and 
thus discarded; see [27] for details. 

The system is also able to handle simpler questions involving sizes of sets, like 
“An animal had two strong legs. The animal had a strong leg?”, “John has three 
big nice cars. John has two big cars?”, and measures, like “The length of the red 
car is 4m. The length of the black car is 5m. The length of the red car is less than 
5m?”. We use terms encoding the sets and measures: for example, the first sen- 
tence of the last question is translated to a formula containing a standard equality 
predicate, an integer and several properties involving the measure term, including 
the main statement 4 = count(measure1(length, cl_car, meter, ctxt(Pres, 1)) 


Instance Generation. In order to answer questions without indicating con- 
crete objects, like “Adult bears are large animals. Cats are small animals. Who 
is a large animal?” we need constants representing an anonymous instance of a 
class, essentially a “default adult bear”, a “default bear” and a “default cat”. 
For each such object the system generates a constant along with the formulas 
indicating its class and properties, enabling the system to produce an answer 
“An adult bear”. 


Question Handling. Actual questions like “Who is big?” or “The length of 
the red car is less than 5m?” require special handling. The automated reasoner 
GK used in the pipeline employs the well-known answer predicate technique 
to construct and output the required substitution term. All the variables in 
the question formula will be instantiated and output, potentially resulting in 
a large combination of different answers. The “Who is big?” question will be 
first translated to 4X, Y, Z prop(big, X, generic, Y, Z) indicating that we are not 
restricting the “bigness” or context in the question. However, we do not want 
to enumerate different “bigness” values or contexts in the answer, thus we wrap 
the formula into a definition (say, def2 ) over a single variable X, and search 
for different substitutions into def2(X) only. Asking questions about location 
and time is implemented by constructing a number of questions over relations 
“near”, “on” 


on”, “at”, etc. 


Clausification and Simplification. The system contains a clausifier skolem- 
izing the formulas and converting these to a conjunctive normal form. The 
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clausification phase also performs several simplifications, some of which are pos- 
sible due to the known properties of the constructed formulas. Since nontrivial 
formulas may be converted into several clauses, the clausifier decides how to 
spread the numeric confidence of the formula and the exception literals in the 
formula into the clauses. 


4.2 Integration with Knowledge Bases 


The knowledge base provides the world model of our reasoning system. To answer 
the query “Tweety is a bird. Can Tweety fly?”, the system needs to have the 
background knowledge that birds can fly. We construct the knowledge base (KB) 
using default logic rules augmented with numeric confidences. A small part of 
the knowledge base forms a core world model and is built by hand, while the 
bulk of the knowledge is integrated automatically from existing common sense 
knowledge (CSK) sources as described in [10]. 

We have integrated eight published knowledge graphs: ConceptNet [25], 
WebChild [30], Aristo TupleKB [15], Quasimodo [24], Ascent++ [16], UnCom- 
monSense [3], ATOMIC39 [9] and ATOMIC! [32]. These CSK sources are col- 
lections of relation triples. The majority of the sources contain natural language 
clauses or fragments in the triple elements. We have built a specialized pattern 
matching semantic parser to convert the relations to first order logic rules with 
the default logic extensions and estimated numeric confidence. The full knowl- 
edge base contains 18.5 million rules, with over 15 million of those are related 
to taxonomy: inferring a property or an event from the class of an entity. 


4.3 Automated Reasoning 


We use our automated reasoner GK to solve the problems generated by semantic 
parser. The reasoner uses both the parser output and a selected subset of the 
world knowledge to solve the questions. Wordnet taxonomies are used to solve 
the precedence problem of exceptions. Large datasets are parsed, indexed and 
kept in shared memory for quick re-use. GK is built on top of a conventional 
high-performance resolution-based reasoner GKC [26] for conventional first order 
logic. Thus GK inherits most of the capabilities and algorithms of GKC. The 
main additional features of GK are following: 


— Using a well-known answer clause mechanism for finding a number of different 
answers, with a configurable limit. 

— Finding expected proofs even if a knowledge base is inconsistent. Basically, 
GK only accepts proofs which contain a clause originating from the question. 

— Searching for both a proof of the question and a negation of the ques- 
tion/negation of each concrete answer. 

— Estimating the numeric confidence in the statements derived from knowledge 
bases containing uncertain contrary and supporting evidence obtained from 
different sources. 

— Handling exceptions by implementing default logic via recursively deepening 
iterations of searches with diminishing time limits. 
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— Performing reasoning by analogy via employing known similarity scores of 
words along with exceptions. 


The first four features are covered in our previous paper [28] and the fol- 
lowing two are covered in [27]. The word similarity handling is currently in an 
experimental phase: the initial experiments show that a naive implementation 
creates an unmanageable search space explosion, and thus a layered approach is 
necessary. 

As a simple example of the basic features, consider sentences “John is nice. 
John is not nice. Mike is nice. Steve is not nice.” GK output to the parsed 
versions of the following questions will directly lead to these answers: “John is 
nice?”: “Unknown”, “Mike is nice?”: “True”, “Mike is not nice?”: “False”, “Who 
is nice?”: “Mike”, “Who is not nice?”: “Steve”. For a slightly more complex 
example, consider the earlier “If an animal likes honey, then it is probably a 
bear. Most bears are big, although young bears are not big. John is an animal 
who likes honey. Mike is a young bear. Who is big?”. GK will output the following 
proof in JSON, where we have removed quotation marks and a number of steps: 


{result:answer found, 


answers: [ 
{ 
answer: [[$ans,some_bear]], 
blockers: 
[[$block, [$,bear,1], [$not, [prop, big, some_bear,$generic,$generic, [$ctxt,Pres,1]]]]], 
confidence:0.85, 
positive proof: 


[ 


[7 , [mp, [5,1] ,6,fromgoal,0.85], 
[[$block, [$,bear,1] , [$not, [prop,big,some_bear ,$generic,$generic, [$ctxt,Pres,1]]]], 
[$ans,some_bear]]] 

I}, 

{ 

answer: [[$ans,ci_John]], 

blockers: [[$block, [$,bear,1], [$not, [prop,big,c1_John,$generic,$generic, [$ctxt,Pres,1]]]], 

[$block, [$,animal,3], [$not, [isa,bear,c1_John]]]], 

confidence:0.765, 

positive proof: 

E 

(1, Lin, frm_10,axiom,0.85], 

[[$block, [$,bear,1],[$not, [prop,big,?:X,$generic,$generic, [$ctxt,Pres,1]]]], 

[prop, big,?:X,$generic,$generic, [$ctxt,Pres,1]], 

[-isa,bear,?:X]]], 

(2, [in,frm_9,axiom,0.9], 

[$block, [$,animal,3], [$not, [isa,bear,?:X]]], 

[-do2,like,?:X,?7:Y,?:Z, [$ctxt,Pres,1]], 

[-isa,honey,?:Y],[-isa,animal,?:X],[isa,bear,?:X]]], 

(18, [mp, [1,2], [17,1] ,fromaxiom,0.765], 

[[$block, [$,bear,1], [$not, [prop, big,ci_John, $generic,$generic, [$ctxt,Pres,1]]]], 

[$block, [$,animal,3], [$not, [isa,bear,c1_John]]], 

[prop, big,c1i_John, $generic,$generic, [$ctxt,Pres,1]]]], 


(21, [in, frm_30,goal,1], [[-$def2,?7:X], [$ans,?:X]]], 
(22, [mp, [20,2] ,21,fromgoal,0.765] , 
[[$block, [$,bear,1], [$not, [prop, big,c1_John,$generic,$generic, [$ctxt,Pres,1]]]], 
[$block, [$,animal,3], [$not, [isa,bear,c1_John]]], 
[$ans,c1_John]]] 
J} 
J} 
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Observe that we get two answers. The following NLP pipeline step removes 
the generic [[$ans ,some_bear]], since the more informative [[$ans,c1_John] ] 
is available. Here both proofs contain only positive parts, although in the general 
case we may find both a positive and a negative proof, each with their own con- 
fidences. GK will throw away both the clauses produced during search and the 
final answers which have a summary confidence below a configurable threshold. 
GK will also throw away proofs which do not contain a goal clause. The confi- 
dences stemming from input sentences like “Most bears are big ...” are taken 
from our ad-hoc mapping of words like “most” to numeric values. By default, 
“normal” rule sentences are given a confidence below one and include a blocker 
literal for allowing exceptions. 

The answers contain blocker literals, which have been recursively checked by 
separate proof searches before the final proof is accepted by GK. The details 
of these failed searches are not shown in the final proof. Had we included the 
sentence “John is not big” in our example, then the proof of the first blocker of 
the main answer would have been found, thus disqualifying the proof and leaving 
us with the final answer “Likely a bear.”. 


4.4 Answers and Explanations in Natural Language 


Answers and explanations are generated from the proof, with additional details 
taken from the database of objects along with their properties as detected during 
semantic parsing. While some of the principles were described in the previous 
section, there are two major tasks to perform: give a suitably detailed representa- 
tion of objects in a proof (say, select between “a car”, “a red car”, “the red car”, 
“Mike’s car” etc.) and create a grammatically correct and easy-to-understand 
textual representation of clauses. The system translates clauses in a proof one- 
to-one to English sentences, as exemplified by the explanation generated from 
the previously presented proof: 

Likely john: 

Confidence 76%. 

Sentences used: 

(1) If an animal likes honey, then it is probably a bear. 

(2) Most bears are big, although young bears are not big. 

(3) John is an animal who likes honey. 

(4) Who is big? 

Statements inferred: 

(1) If X is a bear, then X is big. Confidence 85%. Why: sentence 2. 

(2) If X does like Y and Y is a honey and X is an animal, then X is a bear. 


Confidence 90%. Why: sentence 1. 
(4) If John has a property defi, then John does like cs4. Why: sentence 3. 


(18) John is big. Confidence 76%. Why: statements 1, 17. 


(21) If X matches the query, then X is an answer. Why: the question. 
(22) John is an answer. Confidence 76%. Why: statements 20, 21. 


5 Performance and the Test Set 


The system has miserable performance on most well-known natural language 
inference or question answering benchmarks, the majority of which are ori- 
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ented towards machine learning. As an exception, the performance on the anti- 
machine-learning question set HANS [13] is ca 95%, in contrast to the ca 60% 
performance of LLM systems before the GPT3 family (random choice would 
give 50% performance). The loss of 5% of HANS is due to the wrong UD parses 
chosen by Stanza. 

However, the system is able to solve almost all of the demonstration examples 
of the Allen AI ProofWriter system https: //proofwriter.apps.allenai.org/ and 
is able to solve inference problems the current LLM systems cannot, like the 
examples presented in the introduction. For regression testing we have built a 
set of ca 1600 simple questions with answers, structured over different types of 
capabilities. This test set may be of use for people working towards similar goals. 

The runtime for the small examples presented in the paper is ca 0.5s on a 
Linux laptop with a graphics card usable by Stanza. Of this time, Stanza UD 
parsing takes ca 0.17s, UD to logic takes ca 0.04s, and the rest is spent by the 
reasoner. For more complex examples the reasoner may spend unlimited time, 
i.e. the question is rather how complex questions can be solved in a preconfigured 
time window. In case the size of the input problem is relatively small and a tiny 
world model suffices for the solution, the correct answer is found in ca 1-2 s. 
However, in case the system is given a large knowledge base (KB) with a size 
of roughly one gigabyte, and the answer actually depends on the KB, then the 
search space may explode and the system may fail to find answer in a reasonable 
time. Efficiently handling a very large knowledge base clearly requires suitable 
heuristics based on the semantics and interdependence of rules/facts in the KB. 


6 Towards a Hybrid Neurosymbolic System 


Although the scope of the sentences successfully parsed and questions answered 
could be improved by adding more and more specialized cases to the current 
system, the cost/benefit ratio of this work would rapidly decrease. We’ll describe 
the most promising avenues of extending the system with ML hybridization as 
we currently see them. 


Semantic Parsing. The two main approaches would be (a) end-to-end learning 
from sentences directly to extended logic as exemplified in [31], and (b) using 
existing LLMs or training specialized LLMs to perform simplification of sen- 
tences to the level where a hand-made semantic parser is able to convert the 
sentence to logic. Our initial experiments with the GPT models have shown 
that using a suitable prompt causes the LLMs to successfully split and simplify 
complex sentences. 


Automated Reasoning. Despite being optimized for large knowledge bases and 
performing well in reasoning competitions on such problems, our system often 
fails to find nontrivial proofs in reasonable time in case a large knowledge base 
is used. The main approaches here would be (a) learning to find a proof, based 
on the experience of previous proofs (see [19] for an example), (b) using machine 
learning along with measures of semantic relatedness of formulas to the assump- 
tion and the question (see [6]) for an example), (c) using LLMs to predict inter- 
mediate results or relevant facts and rules. A significant boost in the terms of 
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usability could be achieved by integrating external systems like databases and 
scientific computing with the automated reasoners. 


The Knowledge Base. Publicly available knowledge bases do not focus on for- 
malizing a basic world model, arguably critical for common-sense reasoning. It 
is possible that a core part needs to be built by hand. On the other hand, the 
existing knowledge bases along with large text corpuses can be extended by 
creating crucial new uncertain rules using both simpler statistical methods and 
more complex ML techniques: see [8] for a review. 


7 Summary and Future Work 


We have described an implementation of a full natural language inference and 
question answering pipeline built around an extended first order reasoner. The 
system is capable of understanding relatively simple sentences and giving rea- 
sonable answers to questions, including the types currently out of scope of the 
capabilities of LLMs. We plan to enhance the capabilities of the system by 
incorporating machine learning techniques to the components of pipeline, while 
keeping the overall architecture, including the semantic parser, word knowledge 
and a reasoner. At the time of this writing we are experimenting with using 
off-the-shelf LLMs without finetuning, but with a suitable prompt, to split and 
simplify complex sentences to a degree where our semantic parser is able to 
properly convert the meaning of the resulting sentences to logic. 


References 


1. Abzianidze, L.: Solving textual entailment with the theorem prover for natural 
language. Appl. Math. Inf. 25(2), 1-15 (2020). https://www.viam.science.tsu.ge/ 
Ami/2020_2/8_Lasha.pdf 

2. Abzianidze, L., Kogkalidis, K.: A logic-based framework for natural language infer- 
ence in Dutch. CoRR abs/2110.03323 (2021). https://arxiv.org/abs/2110.03323 

3. Arnaout, H., Razniewski, S., Weikum, G., Pan, J.Z.: Uncommonsense: informa- 
tive negative knowledge about everyday concepts. In: Hasan, M.A., Xiong, L. 
(eds.) Proceedings of the 31st ACM International Conference on Information & 
Knowledge Management, Atlanta, GA, USA, 17—21 October 2022, pp. 37-46. ACM 
(2022). https://doi-org/10.1145/3511808.3557484 

4. Basu, K., Varanasi, S.C., Shakerin, F., Gupta, G.: Square: Semantics-based ques- 
tion answering and reasoning engine. CoRR abs/2009.09158 (2020). https://arxiv. 
org/abs/2009.10239 

5. De Marneffe, M.C., Manning, C.D., Nivre, J., Zeman, D.: Universal dependencies. 
Comput. Linguist. 47(2), 255-308 (2021) 

6. Furbach, U., Kramer, T., Schon, C.: Names are not just sound and smoke: word 
embeddings for axiom selection. In: Fontaine, P. (ed.) CADE 2019. LNCS (LNAI), 
vol. 11716, pp. 250-268. Springer, Cham (2019). https://doi.org/10.1007/978-3- 
030-29436-6_15 

7. Furbach, U., Glöckner, I., Pelzer, B.: An application of automated reasoning in 
natural language question answering. AI Commun. 23(2-3), 241-265 (2010) 


520 


8. 


10. 


11. 


12. 


13. 


14. 


15. 


16. 


Ii 
18. 


19. 


20. 


21. 


22: 


23. 
24. 


T. Tammet et al. 


Han, X., et al.: More data, more relations, more context and more openness: a 
review and outlook for relation extraction. In: Proceedings of the 1st Conference 
of the Asia-Pacific Chapter of the Association for Computational Linguistics and 
the 10th International Joint Conference on Natural Language Processing, pp. 745- 
758 (2020) 


. Hwang, J.D., et al.: (comet-) atomic 2020: on symbolic and neural commonsense 


knowledge graphs. In: Proceedings of the AAAI Conference on Artificial Intelli- 
gence, vol. 35, pp. 6384-6392 (2021) 

Jarv, P., Tammet, T., Verrev, M., Draheim., D.: Knowledge integration for com- 
monsense reasoning with default logic. In: Proceedings of the 14th International 
Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge 
Management - KEOD, pp. 148-155. INSTICC, SciTePress (2022). https://doi.org/ 
10.5220/0011532200003335 

Jiang, A.Q., et al.: Draft, sketch, and prove: Guiding formal theorem provers with 
informal proofs. CoRR abs/2210.12283 (2022). https://arxiv.org/abs/2210.12283 
Kalyanpur, A., Breloff, T., Ferrucci, D.A., Lally, A., Jantos, J.: Braid: Weav- 
ing symbolic and statistical knowledge into coherent logical explanations. CoRR 
abs/2011.13354 (2020). https://arxiv.org/abs/2011.13354 

McCoy, T., Pavlick, E., Linzen, T.: Right for the wrong reasons: diagnosing syn- 
tactic heuristics in natural language inference. In: Proceedings of the 57th Annual 
Meeting of the Association for Computational Linguistics, pp. 3428-3448. Associ- 
ation for Computational Linguistics (2019) 

Mialon, G., et al.: Augmented language models: a survey. CoRR abs/2302.07842 
(2023). https: //arxiv.org/abs/2302.07842 

Mishra, B.D., Tandon, N., Clark, P.: Domain-targeted, high precision knowledge 
extraction. Trans. Assoc. Comput. Linguist. 5, 233-246 (2017). https://doi.org/ 
10.1162/tacl_a_00058 

Nguyen, T.P., Razniewski, S., Romero, J., Weikum, G.: Refined commonsense 
knowledge from large-scale web contents. IEEE Trans. Knowl. Data Eng. (2022). 
https: //doi.org/10.1109/TKDE.2022.3206505 

Paulson, L.C.: Isabelle: A Generic Theorem Prover. Springer, Cham (1994) 
Pendharkar, D., Basu, K., Shakerin, F., Gupta, G.: An asp-based approach to 
answering natural language questions for texts. Theory Pract. Logic Programm. 
22(3), 419-443 (2022). https: //arxiv.org/abs/2009.10239 

Piepenbrock, J., Heskes, T., Janota, M., Urban, J.: Guiding an automated theorem 
prover with neural rewriting. In: Blanchette, J., Kovacs, L., Pattinson, D. (eds.) 
IJCAR 2022. Lecture Notes in Computer Science, vol. 13385, pp. 597-617. Springer, 
Cham (2022). https: //doi.org/10.1007/978-3-031-10769-6_35 

Qi, P., Zhang, Y., Zhang, Y., Bolton, J., Manning, C.D.: Stanza: A python natu- 
ral language processing toolkit for many human languages. CoRR abs/2003.07082 
(2020). https: //arxiv.org/abs/2003.07082 

Rajasekharan, A., Zeng, Y., Padalkar, P., Gupta, G.: Reliable natural language 
understanding with large language models and answer set programming. CoRR 
abs/2302.03780 (2023). https: //arxiv.org/abs/2302.03780 

Ramachandran, D., Reagan, P., Goolsbey, K.: First-orderized researchcyc: expres- 
sivity and efficiency in a common-sense ontology. In: AAAI Workshop on Contexts 
and Ontologies: Theory, Practice and Applications, pp. 33-40 (2005) 

Reiter, R.: A logic for default reasoning. Artif. Intell. 13(1-2), 81-132 (1980) 
Romero, J., Razniewski, S., Pal, K., Pan, J.Z., Sakhadeo, A., Weikum, G.: Com- 
monsense properties from query logs and question answering forums. In: Zhu, W., 


25. 


26. 


27. 


28. 


29. 


30. 


31. 


32. 


33. 


34. 


Automated Reasoning in Natural Language 521 


et al. (eds.) Proceedings of CIKM 2019 - the 28th ACM International Conference 
on Information and Knowledge Management, pp. 1411-1420. ACM (2019) 

Speer, R., Chin, J., Havasi, C.: ConceptNet 5.5: an open multilingual graph of 
general knowledge. In: Singh, S.P., Markovitch, S. (eds.) Proc. of AAAI 2017 - the 
31st AAAI Conference on Artificial Intelligence, pp. 4444-4451. AAAI (2017) 
Tammet, T.: GKC: a reasoning system for large knowledge bases. In: Fontaine, P. 
(ed.) CADE 2019. LNCS (LNAI), vol. 11716, pp. 538-549. Springer, Cham (2019). 
https: //doi.org/10.1007/978-3-030-29436-6_32 

Tammet, T., Draheim, D., Jarv, P.: Gk: implementing full first order default logic 
for commonsense reasoning (system description). In: Blanchette, J., Kovacs, L., 
Pattinson, D. (eds.) IJCAR 2022. LNCS, vol. 13385, pp. 300-309. Springer, Cham 
(2022). https: //doi.org/10.1007/978-3-031-10769-6_18 

Tammet, T., Draheim, D., Jarv, P.: Confidences for commonsense reasoning. In: 
Platzer, A., Sutcliffe, G. (eds.) CADE 2021. LNCS (LNAI), vol. 12699, pp. 507-524. 
Springer, Cham (2021). https://doi.org/10.1007/978-3-030-79876-5_29 

Tammet, T., Sutcliffe, G.: Combining JSON-LD with first order logic. In: 2021 
IEEE 15th International Conference on Semantic Computing (ICSC), pp. 256- 
261. IEEE (2021) 

Tandon, N., de Melo, G., Weikum, G.: Webchild 2.0 : fine-grained commonsense 
knowledge distillation. In: Bansal, M., Ji, H. (eds.) Proceedings of ACL 2017, 
System Demonstrations, pp. 115-120. Association for Computational Linguistics 
(2017). https: //doi.org/10.18653/v1/P17-4020 

Wang, C., Bos, J.: Comparing neural meaning-to-text approaches for Dutch. Com- 
put. Linguist. Neth. 12, 269-286 (2022) 

West, P., et al.: Symbolic knowledge distillation: from general language models to 
commonsense models. CoRR abs/2110.07178 (2021). https://arxiv.org/abs/2110. 
07178 

Winter, Y., Zwarts, J.: Event semantics and abstract categorial grammar. In: 
Kanazawa, M., Kornai, A., Kracht, M., Seki, H. (eds.) MOL 2011. LNCS (LNAI), 
vol. 6878, pp. 174-191. Springer, Heidelberg (2011). https: //doi.org/10.1007/978- 
3-642-23211-4_11 

Wu, Y., et al.: Autoformalization with large language models. Adv. Neural. Inf. 
Process. Syst. 35, 32353-32368 (2022) 


Open Access This chapter is licensed under the terms of the Creative Commons 
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), 
which permits use, sharing, adaptation, distribution and reproduction in any medium 
or format, as long as you give appropriate credit to the original author(s) and the 
source, provide a link to the Creative Commons license and indicate if changes were 
made. 


The images or other third party material in this chapter are included in the 


chapter’s Creative Commons license, unless indicated otherwise in a credit line to the 
material. If material is not included in the chapter’s Creative Commons license and 
your intended use is not permitted by statutory regulation or exceeds the permitted 
use, you will need to obtain permission directly from the copyright holder. 


Check for 
updates 


Combining Combination Properties: An 
Analysis of Stable Infiniteness, Convexity, 
and Politeness 


Guilherme V. Toledo!®)@®, Yoni Zohar!@, and Clark Barrett? © 


1 Bar-Ilan University, Ramat Gan, Israel 
guivtoledo@gmail.com, yoni.zohar@cs.tau.ac.il 
2 Stanford University, Stanford, USA 


barrett@cs.stanford.edu 


Abstract. We make two contributions to the study of theory combina- 
tion in satisfiability modulo theories. The first is a table of examples for the 
combinations of the most common model-theoretic properties in theory 
combination, namely stable infiniteness, smoothness, convexity, finite wit- 
nessability, and strong finite witnessability (and therefore politeness and 
strong politeness as well). All of our examples are sharp, in the sense that 
we also offer proofs that no theories are available within simpler signatures. 
This table significantly progresses the current understanding of the vari- 
ous properties and their interactions. The most remarkable example in this 
table is of a theory over a single sort that is polite but not strongly polite 
(the existence of such a theory was only known until now for two-sorted 
signatures). The second contribution is a new combination theorem show- 
ing that in order to apply polite theory combination, it is sufficient for one 
theory to be stably infinite and strongly finitely witnessable, thus showing 
that smoothness is not a critical property in this combination method. This 
result has the potential to greatly simplify the process of showing which 
theories can be used in polite combination, as showing stable infiniteness 
is considerably simpler than showing smoothness. 


Keywords: Satisfiability modulo theories - Theory combination - 
Theory politeness 


1 Introduction 


Theory combination focuses on the following problem: given procedures for deter- 
mining the satisfiability of formulas over individual theories, can we find a pro- 
cedure for the combined theory? One of the foundational results in this field 
is in Nelson and Oppen’s paper [9], where the authors show how to combine 
theories with disjoint signatures as long as they are both stably infinite, i.e., for 
every quantifier-free formula that is satisfied in the theory, there is an infinite 


interpretation of the theory that satisfies it. 


With the introduction of stable infiniteness was born the notion of identifying 
model-theoretic properties that enable theory combination. It soon became clear, 
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however, that this first step was insufficient, since some important theories with 
real-world applications (like the theories of bit-vectors and finite datatypes) turned 
out not to be stably infinite. Early attempts to find alternatives for stable infinite- 
ness in theory combination included the introduction of gentle [5], shiny [12], and 
flexible [7] theories. We focus here on the notion of politeness, which forms the basis 
for theory combination in the state-of-the-art SMT solver cvc5 [1]. 

First considered in [10], polite theories were originally defined as those theo- 
ries that are both smooth and finitely witnessable. Both notions are much harder 
to test for than stable infiniteness, but once a theory is known to be polite, it 
can be combined with any other theory, even non-stably-infinite ones. 

A small problem in the proof of the main result of the paper was corrected in 
later work [6]. This paper introduces a slightly different, more strict, definition 
of politeness, together with a correct proof showing that polite theories can be 
combined with arbitrary theories. Following [4], we refer to theories satisfying 
the new definition as strongly polite, which is defined as being both smooth and 
strongly finitely witnessable; with that in mind, we call theories satisfying the 
earlier definition simply polite. 

For some time, it was not known whether there exists a theory that is polite 
but not strongly polite. Then, in 2021 Sheng et al. [11] provided an example. 
This suggests the need for a more thorough analysis of properties such as stable 
infiniteness, smoothness, finite witnessability, and strong finite witnessability, as 
they appear to interact with each other in sometimes surprising or unforeseeable 
ways. We add to this list convexity, which was shown to be closely related to 
stable infiniteness in [2]. 

In this paper, we provide an exhaustive analysis, with examples whenever 
possible, of whether and how these properties can coexist. Some combinations 
are obviously impossible, such as a strongly finitely witnessable theory that is 
not finitely witnessable; the feasibility of other combinations is more elusive; for 
instance, it is initially unclear whether there can be a one-sorted, non-stably- 
infinite theory that is also not finitely witnessable (we show that this is also 
impossible). A main result is a comprehensive table describing what is known 
about all possible combinations of these properties. 

During the course of filling the table, we were also able to improve polite 
combination: by making the involved proof slightly more difficult, we can simplify 
the main polite theory combination result: we show that in order to combine 
theories, it is enough for one theory to be stably infinite and strongly finitely 
witnessable; there is no need for smoothness. This result simplifies the process 
of qualifying a theory for polite combination, as showing stable infiniteness is 
considerably simpler than showing smoothness. 

The paper is organized as follows. Section 2 defines the basic notions we will 
make use of throughout the paper. Section 3 proves several theorems showing 
the unfeasibility of certain combinations of properties. Section 4 describes the 
example theories that populate the feasible entries of the table. Section 5 offers 
a new combination theorem. And finally, Sect.6 gives concluding remarks and 
directions for future work.! 


1 Due to space limitations, proofs are included in an appendix to [13]. 
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n 


Won = az. i a(x; = zj), Wen =4 2 Vy. V y = Xi, Won = Won AVen 


1<i<j<n i=1 


Fig. 1. Cardinality Formulas. 7T stands for iecoris 


2 Preliminary Notions 


2.1 First-Order Signatures and Structures 


A many-sorted signature X is a triple formed by a countable set Sy of sorts, a 
countable set of function symbols Fs, and a countable set of predicate symbols 
Ps which contains, for every sort o € Sy, an equality symbol =, (often denoted 
by =); each function symbol has an arity o1 X +--+ X On — o and each predicate 
symbol an arity 01 X- +- X On, where o1,...,0,,0 E Sy and n € N. Each equality 
symbol =, has arity ø x ø. A signature with no function or predicate symbols 
other than equalities is called empty. 

A many-sorted signature X is one-sorted if Sx has one element; we may refer 
to many-sorted signatures simply as signatures. Two signatures are said to be 
disjoint if they share only sorts and equality symbols. 

We assume for each sort in Sy a distinct countably infinite set of variables, 
and define terms, literals, and formulas (atomic or not) in the usual way. If s is a 
function symbol of arity o — o and z is a variable of sort ø, we define recursively 
the term s*(x), for k € N, as follows: s°(x) = x, and s*t+!(x) = s(s*(x)). We 
denote the set of free variables of sort o in a formula y by vars,(y), and given 
SC Ss, varss(y~) = Uses varse(y) (we use vars(ġ) as shorthand for varss, ). 

A S-structure A is composed of sets o^ for each sort ø € Sy, called the 
domain of c, equipped with interpretations f^ and P4 of the function and 


predicate symbols, in a way that respects their arities. Furthermore, =4 must 


o 
be the identity on o^. 
A X-interpretation A is an extension of a X-structure that also interprets 
variables, with the value of a variable x of sort ø being an element x^ of o^; we 
will sometimes say that an interpretation $ is an interpretation on a structure 
A (over the same signature) to mean that B has A as its underlying structure. 
We write a^ for the interpretation of the term a under A; if I is a set of terms, 
we define T^ = {a4 : a € I}. We write AF y if A satisfies y. A formula y is 
called satisfiable if it is satisfied by some interpretation A. 
We shall make use of standard cardinality formulas, given in Fig.1. YZ, is 


only satisfied by a structure A if |o| is at least n, WZ,, is only satisfied by 
A if |o4| is at most n, and wZ,, is only satisfied by A if |o^] is exactly n. In 
one-sorted signatures, we may drop o from the formulas, giving us Wn, Y<n 
and Wen. 

The following lemmas are generalizations of the standard compactness and 
downward Skolem-Léwenheim theorems of first-order logic to the many-sorted 
case. They are proved in [8]. 
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Lemma 1 ([8]). A set of formulas is satisfiable iff each of its finite subsets is 
satisfiable. 


Lemma 2 ([8]). Jf a set of formulas is satisfiable, there exists an interpretation 
A which satisfies it and where o^ is countable whenever it is infinite, for every 
sort o. 


A theory T is a class of all X-structures that satisfy some set of closed 
formulas (formulas without free variables), called the aziomatization of T which 
we denote as Az(T); such structures will be called the models of T, a model being 
called trivial when g^ is a singleton for some sort ø in Sy. A ¥-interpretation 
A whose underlying structure is in T is called a T-interpretation. A formula is 
said to be 7 -satisfiable if there is a T-interpretation that satisfies it; a set of 
formulas is T-satisfiable if there is a Z -interpretation that satisfies each of its 
elements. Two formulas are T-equivalent when every T-interpretation satisfies 
one if and only if it satisfies the other. We write Fy y and say that ọ is T-valid 
if A E » for every T-interpretation A. Let 27; and Xə be disjoint signatures; 
by X = X U Xə, we mean the signature with the union of the sorts, function 
symbols, and predicate symbols of 7; and 2, all arities preserved. Given a 
Xı-theory T; and a Xə-theory 72, the X1 U Xə-theory T = Ti @ Tə is the theory 
axiomatized by the union of the axiomatizations of J, and 72. 


2.2 Model-Theoretic Properties 


Let X be a signature. A X-theory T is said to be stably infinite w.r.t. S C Sy» if, 
for every T-satisfiable quantifier-free formula ¢, there exists a J-interpretation 
A satisfying ¢ such that, for each ø € S, o^ is infinite. T is smooth w.r.t. S C Sy 
when, for every quantifier-free formula ¢, T-interpretation A satisfying ¢, and 
function « from S to the class of cardinals such that x(a) > |o+| for every o € S, 
there exists a T-interpretation B satisfying ¢ with |o8| = «(c), for every ø € S. 


Theorem 1. Let X be a signature, S C Ss, and T a X’-theory. If T is smooth 
w.r.t. S, then it is also stably infinite w.r.t. S. 


For a finite set of sorts S, finite sets of variables V, of sort ø for each a € S, 
and equivalence relations E, on V,, the arrangement on V = U- cs Vo induced by 
E = Useg Es, denoted by ôy or 64, is the quantifier-free formula given by dy = 
Noes [Aze y(® = Y) ^ Acey “(@ = y)|, where E, denotes the complement of 
the equivalence relation Eg. 

A theory T is said to be finitely witnessable w.r.t. the set of sorts S C 
Ss when there exists a function wit, called a witness, from the quantifier-free 
formulas into themselves that is computable and satisfies for every quantifier- 
free formula ¢: (i) ¢ and 1 Ww. wit(¢) are T-equivalent, where W = vars(wit(d)) \ 
vars(); and (ii) if wit(@) is T-satisfiable, then there exists a T-interpretation 
A satisfying wit(ġ) such that o^ = vars,(wit(¢))4 for each ø € S. T is said 
to be strongly finitely witnessable if it has a strong witness wit, which has the 
properties of a witness with the exception of (ii), satisfying instead: (i2’) given 
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a finite set of variables V and an arrangement dy on V, if wit(@) A ôv is T- 
satisfiable, then there exists a T-interpretation A satisfying wit(¢) A dy such 
that o^ = vars, (wit(d) A ra ie for alla € S. 

From the definitions, the following theorem directly follows: 


Theorem 2. Let X be a signature, S C Sy, and T a X-theory. If T is strongly 
finitely witnessable w.r.t. S then it is also finitely witnessable w.r.t. S. 


A theory that is both smooth and finitely witnessable w.r.t. (a set of sorts) 
S is said to be polite w.r.t. S; a theory that is both smooth and strongly finitely 
witnessable w.r.t. S is called strongly polite w.r.t. S. For theories over one-sorted 
empty signatures, we have the following theorem from [11]: 


Theorem 3 ({11]). Every one-sorted theory over the empty signature that is 
polite w.r.t. its only sort is strongly polite w.r.t. that sort. 


A one-sorted theory T is said to be convex if, for any conjunction of literals 
¢ and any finite set of variables {u1, v1, ...,Un, Un}, Fr 6 > Via; ui = v; implies 
Fr ġ > ui = vi, for some i € [1, n]. 

Given a one-sorted theory 7, its mincard function takes a quantifier-free 
formula ¢ and returns the countable cardinal min{|o~“| : A is a -interpretation 
that satisfies }.? 

Throughout this paper, we will use SI for stably infinite, SM for smooth, 
FW for finitely witnessable, SW for strongly finitely witnessable, and CV for 
convex. 


3 Negative Results 


If it were possible, we would present examples of every combination of proper- 
ties using only the one-sorted empty signature, which is the simplest signature 
imaginable. 

Of course, this is not always possible: smooth theories are necessarily sta- 
bly infinite, and strongly finitely witnessable theories are obligatorily finitely 
witnessable. But there are several other connections we now proceed to show, 
which further restrict the combinations of properties that are possible. 

In Sect. 3.1, we show that, under reasonable conditions, a convex theory must 
be stably infinite, while the reciprocal is also true over the empty signature. In 
Sect. 3.2, we show that over the empty one-sorted signature, theories that are not 
stably infinite are necessarily finitely witnessable (a somewhat counter-intuitive 
result, since we usually look for theories that are, simultaneously, smooth and 
strongly finitely witnessable) and, more importantly, that stably-infinite and 
strongly finitely witnessable one-sorted theories are also strongly polite. 


2 Note that this definition was generalized in two different ways to the many-sorted 
case in [4] and [10]. However, for our investigation, the single-sorted case is enough. 
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3.1 Stable-Infiniteness and Convexity 


Convexity is typically defined over one-sorted signatures. Here we offer the fol- 
lowing generalization to arbitrary signatures. 


Definition 1. A theory T is said to be convex w.r.t. a set of sorts S C Ss if, 
for any conjunction of literals @ and any finite set of variables {u1, 1, ..., Un, Un} 
with sorts in S, if-r $ > Vi; ui = v; then Fr ġ > ui = vi, for somei € [1,n]. 


If we assume, as it is often natural to, that our theories have no trivial models, 
then convexity implies stable infiniteness. This is true for the one-sorted case, 
as proved in |2], but also for the many-sorted case as we show here. The proof is 
similar, though here we need to account for several sorts at once. In particular, 
the proof relies on Lemma 1. 


Theorem 4. Ifa X-theory T is convex w.r.t. some set S of sorts and, for each 
o E€ S, Fr YZ, then T is stably infinite w.r.t. S. 


Reciprocally, we may also obtain convexity from stable infiniteness, but only 
over empty signatures. 


Theorem 5. Any theory over an empty signature that is stably infinite w.r.t. 
the set of all of its sorts is convex w.r.t. any set of sorts. 


As we shall see in Sect. 4, this result is tight: there are theories over non-empty 
signatures that are stably infinite but not convex. 


3.2 More Connections 


We next present more connections between the properties. First, over the one- 
sorted empty signature, a theory must be either stably infinite or finitely wit- 
nessable. 


Theorem 6. Every one-sorted, non-stably-infinite theory T with an empty sig- 
nature is finitely witnessable w.r.t. its only sort. 


The following theorem shows that for one-sorted theories, strong politeness 
is a corollary of strong finite witnessability and stable infiniteness (rather than 
smoothness). 


Theorem 7. Every one-sorted theory that is stably infinite and strongly finitely 
witnessable w.r.t. its only sort is smooth, and therefore strongly polite w.r.t. that 
sort. 


Generalizing this theorem to the case of many-sorted signatures is left for future 
work. 

Finally, by combining previous results, we can also get the following theorem, 
which relates stable infiniteness, strong finite witnessability, and convexity. 
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Fig. 2. A diagram of combinations over a one-sorted, empty signature: gray regions 
are empty. 


Theorem 8. A one-sorted theory T with an empty signature that is neither 
strongly finitely witnessable nor stably infinite w.r.t. its only sort cannot be con- 
ven. 


To summarize, while Theorem 4 is restricted to structures with no domains 
of cardinality 1, the remaining theorems of this section are not restricted to 
such structures. Theorem 5 applies to empty signatures, Theorem 7 applies to 
one-sorted signatures, and Theorems 6 and 8 apply to signatures that are both 
empty and one-sorted. Put together, we see that many combinations of properties 
for theories over a one-sorted empty signature are actually impossible. This is 
depicted in Fig. 2, in which all areas but the white ones are empty. For example, 
Theorem 6 shows that the area outside the SI and FW circles (representing 
theories that are neither stably infinite nor finitely witnessable) is empty, as every 
theory (over an empty one-sorted signature) must have one of these properties. 
Similarly, Theorem 8 further shows that within the CV (convex) circle, even 
more is empty, namely anything outside the SI and SW circles. 


4 Positive Results 


We now proceed to systematically address all possible combinations of stable- 
infiniteness, smoothness, finite witnessability, strong finite witnessability, and 
convexity. 

The results are summarized in Table 1. Each row corresponds to a possible 
combination of properties, as determined by the truth values in the first five 
columns. For example, in the first row, the entries in the first five columns are 
all true, indicating that in this row, all theory examples must be stably-infinite, 
smooth, finitely witnessable, strongly finitely witnessable, and convex. The rest 
of the columns correspond to different possibilities for the theory signatures: 
either empty or non-empty, and either one-sorted or many-sorted. Again, looking 
at the first row, we see four different theories listed, one for each of the signature 
possibilities. 

Some entries in the table list theorems instead of providing example theories. 
The listed theorems tell us that there do not exist any example theories for these 
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entries. For example, lines 3 and 4 cannot provide examples over a one-sorted 
empty signature because of Theorem 3. 

When an example is available, its name is given in corresponding cell of the 
table. The theories themselves are defined in Sect. 4.1 to 4.4. The examples on 
lines 25, 27 and 31 must have at least one structure with a trivial domain (i.e., 
a domain with exactly one element) because of Theorem 4. 

Lines 9, 10, 13, and 14 cover theories that are stably infinite and strongly 
finitely witnessable but not smooth. We call these unicorn theories because we 
could not find any such theories, nor do we believe they exist, but (ignoring the 
obvious cases ruled out by Theorems 2, 5 and 7) we have no proof that they do 
not exist. 


Definition 2. A unicorn theory is stably infinite and strongly finitely witness- 
able but not smooth. 


Theorem 7 shows that there are no one-sorted unicorn theories. We believe it 
may be possible to provide a generalization of the upwards Léwenheim-Skolem 
theorem to many-sorted logic in such a way that it would prove the non-existence 
of unicorn theories, which leads to the following conjecture: 


Conjecture 1. There are no unicorn theories. 


Before defining the theories of Table 1, we introduce the following signatures. 


Definition 3. X; is the empty one-sorted signature with sort a, Xə is the empty 
two-sorted signature with sorts o and a2, and Xs is the one-sorted signature with 
a single unary function symbol s. 


We now describe the theories: Sect. 4.1 describes the theories that are over the 
empty one-sorted signature; Sect. 4.2 then continues to the next column, describ- 
ing theories over many-sorted empty signatures. Some build on the theories of the 
previous column, but some are also new. Section 4.3 describes the next column, 
one-sorted theories over a non-empty signature. Here, we use two constructions 
to generate new theories from previously introduced ones. One construction adds 
a function symbol to an empty signature (in a way that preserves all proper- 
ties), and the second preserves all properties but convexity, making it possible 
to construct non-convex examples in a uniform way. We also present new theo- 
ries when the constructions are not sufficient. Finally, Sect. 4.4 describes theories 
over non-empty many-sorted signatures.’ 


3 Proofs that each theory has the claimed properties can be found in the appendix 
to [13]. 
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Table 1. Summary of all possible combinations of theory properties. Shaded cells 
represent impossible combinations. In line 26: n > 1; in line 28: m > 1, n > 1 and 
Įm- n| > 1. 


Empty Non-empty 
SI | SM | FW|SW | CV | One-sorted | Many-sorted | One-sorted | Many-sorted | NS 
r E Ton (Ton)? (Zon)s ((Tn)”)s 1 
T F Theorem 5 (Ton)v ((Tsn)7)v 2 
T 12,3 Ty (Z7)s 3 
T £ -ra aE TF a)y 4 
T a Theorem 2 3 
F > 6 
p E Too (Zoo) (Zoo) ((Zoo)”)s 7 
T F Theorem 5 (Too )v ((Too)”)v 8 
P T = Theorem 7 ee Theorem 7 Unicorn a 
p E Teren (Tan) (Zeven)s (Zen) )s | Lt 
F F Theorem 5 (Ten) v (Teen) )v 12 
T i Theorem 2 = 
F 
eg Rc E E 
F Theorem 5 (Tn,oo )v ((Tn,co)”)v 16 
T 17 
T -F 18 
T T Theorem 1 T9 
F 
F 20 
= T 21 
T F Theorems 1 and 2 TD 
F T 23 
F F Theorem 1 oz 
F 
r Lt Tzi (T<1)° (Z<1)s ((T<1)")s_ | 25 
T F Tzn (Izn) (Tzn)s (zn) s |26 
p T | Theorem 8 Ton Tz (Fee) « 27 
F F Tim,n) Tinn (mnie | Ciona) je | 28 
T zs Theorem 2 = 
F F 30 
T Trs T, (Tr?) s 31 
F 1,60 
F Theorem 6 TS TÉ, (T). 32 
4.1 Theories over the One-Sorted Empty Signature 
Table 2. X1-theories Table 3. 3/5-theories 
Name | Axiomatization Name | Axiomatization 
Ton  |{b>n} Toa |{(blo A PS3) V (Sg A YSZ): k EN} 
Too {born :k EN} Tee {pZ,}U {7¥%3,, ik EN} 
Taen |{7Wa2n+1 : k EN} TO Hetu i k © N} 
Trio |{P=n V Pon: k EN} TS? | {eZ} U {VI} : k EN} 
Tcen |{Y<n} 
Tim,n) |{Pam V pan} 


The axiomatizations for theories over the one-sorted empty signature % are 
given in Table 2. We briefly describe them here. 
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For each n > 0, Tsn includes all structures with domains of cardinality at 
least n; To. is the theory including all structures whose domains are infinite; 
Te, has structures with either an even or an infinite number of elements in 
their domains and was defined in [11], where it was proved to be finitely wit- 
nessable, but neither smooth nor strongly finitely witnessable. The proofs justify- 
ing Table 1 show additionally that it is stably infinite and convex. n,o contains 
those structures whose domains have either exactly n or an infinite number of 
elements; T<,, includes all structures with at most n elements in their domains; 
and for positive integers m and n, Tim n) has structures whose domains have 
either precisely m elements, or precisely n elements. This completes the first 
column of theory examples. 


Example 1. The theory Ts, admits all considered properties, while Tim») 
admits only finite witnessability. 


4.2 Theories over the Two-Sorted Empty Signature 


We next introduce the theories over empty two-sorted signatures. For many 
cases, we can simply add a trivial sort to one of the theories defined in Sect. 4.1. 
When this is not possible, we introduce new theories. 


Adding a Sort to a Theory. Any X-theory can be used to generate a Xs- 
theory simply by adding the sort o2 to the signature (without changing the 
axiomatization). This is formalized as follows: 


Definition 4. Let T be a Xı-theory. (T)? is the Xz-theory axiomatized by 
Ax(T). 


Lemma 3. A Xı-theory T is stably infinite, smooth, finitely witnessable, 
strongly finitely witnessable, or conver w.r.t. {0} if and only if (T)? is, respec- 
tively, stably infinite, smooth, finitely witnessable, strongly finitely witnessable, 
or conver w.r.t. {0,02}. 


Using Definition 4 and Lemma 3, we can populate many lines in the second 
column of examples by extending the corresponding theory from the previous 
column. 


Example 2. (Tsn)? is a theory over two sorts, 7 and 02, whose structures must 
have at least n elements in the domain of o (but have no restrictions on the 
size of the domain of 72). As seen in the first line of Table 1, J}, admits all the 
considered properties. By Lemma 3, so does (Ts;,)?. 


Additional Theories over X2. On some lines, e.g., line 3, there is no ¥,-theory 
to extend. In such cases, we cannot use Definition 4 to construct a many-sorted 
variant. 

We introduce the theories shown in Table 3 to cover these cases. The theory 
T2 3 contains two kinds of structures: (i) structures whose domains both have at 
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least 3 elements; and (ii) structures with exactly two elements in the domain of 
o and an infinite number of elements in the domain of a2. The theory 7;°%7 has 
structures with exactly one element in the domain of o and either an odd or an 
infinite number of elements in the domain of o2. The theory 7,°° is similar: it has 
structures with exactly one element in the domain of o and an infinite number 
of elements in the domain of o2. Finally, 77° is similar to 7° except that its 
structures have exactly 2 elements in the domain of o. 


Example 3. The theory 723 was first defined in [4] and later used in [11], where 
it was proved to be polite (and therefore smooth, stably infinite, and finitely 
witnessable) without being strongly polite (and therefore not strongly finitely 
witnessable). The justification proofs for Table1 show that J23 is convex as 
well.4 


4.3 Theories over a One-Sorted Non-empty Signature 


We continue to the next column, with one-sorted non-empty signatures. 
Section 4.3 shows how to construct non-empty theories from one-sorted theo- 
ries over the empty signature, while preserving all their properties. In Sect. 4.3, 
we provide a similar construction which generates non-convex theories from the 
theories in the first column of examples. And in Sect. 4.3, we introduce addi- 
tional theories not captured by the above constructions. Two of these theories 
are described in more detail in Sect. 4.3. 


Extending a Theory with a Unary Function Symbol While Preserv- 
ing Properties. Whenever we have a theory over an empty signature, we can 
construct a variant of it over a non-empty signature by introducing a function 
symbol and interpreting it as the identity function. This extension preserves all 
the properties that we consider. This is formalized as follows. 


Definition 5. Let X, be an empty signature with sorts S = {01,..., 0n}, and 
let T be a Xn-theory. The signature X? has sorts S and a single unary function 
symbol s of arity 0, — 01, and (T)s is the X?-theory axiomatized by Ax(T) U 
{v x. [s(x) = a]}, where x is a variable of sort o. 


Lemma 4. For every theory T over an empty signature X, with sorts S = 
{o1,...,On}: T is stably infinite, smooth, finitely witnessable, strongly finitely 
witnessable, or convex w.r.t. S if and only if (T)s is, respectively, stably infinite, 
smooth, finitely witnessable, strongly finitely witnessable, or conver w.r.t. S. 


We use the operator (-), in various places in Table 1 in order to obtain examples 
in non-empty signatures from existing examples over X; and %3. 


Example 4. (T>n)s is a one-sorted theory, whose structures have at least n ele- 
ments and interpret the function symbol s as the identity. As seen above, T>n 
admits all the considered properties. By Lemma 4, so does (Tn). 


4 We thank Oded Padon for raising the question of whether there exists a theory that 
is polite and convex, but not strongly polite. 
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Making a Theory Non-convex. The last general construction that we present 
aims at taking a theory and creating a non-convex variant of it while preserving 
the other properties we consider. This can be done with the addition of a single 
unary function symbol s. To define such a theory, we make use of the formula 
wy from Fig. 3. Intuitively, wy states that in an interpretation A in which it 
holds, s4(s4(a)) must equal either s4(a) or a itself; in other words, either a = 
s^(a) = s^(s^(a)), a = s^(s^(a)) # sA(a), or a # s^(a) = s^(s^(a)), as 
shown in Fig. 4. 


wy =Va. [(s?(a) =2)V (°° (x) = s(x))] 


Fig. 3. The formula Yvy for non-convex theories. 


This is especially useful for defining non-convex theories, since (s? (x) = x) V 
(s?(x) = s(x)) is valid in the theory, but neither s?(x) = x nor s?(x) = s(x) is. 
Notice, of course, that non-convexity is only possible when there are at least two 
elements available in the domain — otherwise, all equalities are satisfied. 


—_ à 
a s4 a s^(a) a > s^(a) sf 
ig 


Fig. 4. Possible scenarios when wy holds. 


Definition 6. Let T be a theory over an empty signature with sorts S = 
{o1,...,0n}. Then (T)y is the X? -theory axiomatized by Aa(T) U {wy}. 


Lemma 5. Let T be a theory over an empty signature Xn with sorts S = 
{o1,...,0n}. Then: (T)y is stably infinite, smooth, finitely witnessable, or 
strongly finitely witnessable w.r.t. S if and only if T is, respectively, stably infi- 
nite, smooth, finitely witnessable, or strongly finitely witnessable w.r.t. S. In 
addition, if T has a model A with || > 2, (T)y is not conver with respect to 
S. 


Example 5. The theory (Tsn)v is one-sorted, and its structures have at least n 
elements. they interpret the symbol s in a way that satisfies Yv. In particular, for 
each element a of the domain, one of the scenarios from Fig. 4 holds. According 
to Lemma 5, since J;,, admits all properties, (J>,,)y admits all properties but 
convexity. 
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Additional Theories over X.. Whenever there is a X1-theory with some 
properties, we can obtain a X, theory with the same properties using one of the 
techniques above. To cover cases for which there is no corresponding 5,-theory, 
we use the theories presented in Table 4 and described below. 


Table 4. 5’,-theories 


Name | Axiomatization 

Ty YS p(n) A Pe race) V Vna A pF sacs)! : k EN \ {O}} 
Tj | Aa(T;) U {hv} 

Tria | {a1 V Fy- AV. a(8(a) = 
Tio | {a1 V [Woe AV 2. >(s(x) = 8)] 
T; {lYy=2 A Yz. (s(x) = x)|] V on AY z. ~(s(x) = x)|] : k € N} 


x)|:k € N} 


We start with Ta T,7.., and TÉ 


1,00? 2,00? 
to Sect. 4.3. The theory Ti has structures A with either an infinite or an odd 
number of elements and with the property that if A is not trivial, then s4 (a) 4 a 
for all a € o^. The theory Tho has all structures A that either: (i) are trivial; 
or (ii) have infinitely many elements and for which s4 (a) 4 a for each a € o^. 
Similarly, To has structures A that either: (i) have exactly two elements and 
interpret s as the identity; or (ii) have infinitely many elements and interpret s 
in such a way that s4 (a) £ a for all a € o^. 


On the Theories T and T+. We now introduce the theories Ty and 77. The 
importance of these theories is that both of them are one-sorted theories that 
are polite but not strongly polite (the first is also convex and the second is not). 
Their existence improves on the result of [11], which introduced a two-sorted 
theory that is polite but not strongly polite (namely 72,3). 

For their axiomatizations, we use the formulas from Fig.5, in which s is a 
unary function symbol. w5,, (w2,,) states that a structure A has at least (exactly) 


deferring the discussion on Ty and 77 


n elements a satisfying s^ (a) = a; similarly, YŽ„ (v2,,) states that a structure 
A has at least (exactly) n elements a satisfying s4(a) £ a. 

Further, the axiomatization requires a function f from positive integers to 
{0,1} that is not computable with the property that for k > 0, f maps half of 
the numbers in the interval [1,2*] to 1 and the other half to 0. The existence of 
such a function is formalized below. We start by defining counting functions fo 
and Jie 


Definition 7. Let f : N\ {0} — {0,1}. For i € {0,1} and n EN, fi(n) is 
defined by: fi(n) = | FTH N [1, nl]. 
Intuitively, fo(n) counts how many numbers between 1 and n (inclusive) are 


mapped by f to 0 and fı(n) counts how many are mapped to 1. Because f(n) 
always equals 0 or 1, it is easy to see that for every n > 0, n = fi(n) + fo(n). 


Combining Combination Properties 535 


n 


Vin =I. [N pei) Abn], VS =I P. [N (aa) A ôn], 


i=1 i=1 


in =IP. [őn A N pla) AV «. [p(x) > V a =m], 


i=1 i=1 


Yen =J. [bn A N 772s) A Yg. pple) > V = ail]. 


t=1 i=1 


Fig. 5. Cardinality formulas for signatures with a unary function symbol s. X stands 
for 1,...,%n, p(x) for s(x) = x, and ôn for Aye; jen “(ti = Tj). 


Lemma 6. There exists a function f : N\ {0} — {0,1} such that f(1) = 1 with 
the properties that: f is not computable; and, for every k € N \ {0}, fo(2*) = 
fa (2*). 


Example 6. The constant function that assigns 0 to all positive integers satisfies 
neither the first nor the second condition of Lemma 6. The function that assigns 
0 to even numbers and 1 to odd numbers satisfies the second condition, but not 
the first. Of course, any non-computable function satisfies the first condition. An 
example could be found by a function that returns 1 if the Turing machine that 
is encoded by the given number halts and 0 otherwise, under some encoding. 
Finding a function that admits both conditions is more challenging. 


Let f be some function with the properties listed in Lemma 6. We can now 
define J; over X, (note that f itself is not a part of the signature, but is rather 
used to help define the axioms of Tp). Ty consists of those structures A that 
either (i) have a finite cardinality n, with fi(n) elements satisfying s4(a) = a, 
and fo(n) elements satisfying s4(a) Æ a (and thus A satisfies VS p(n) ^ WE suites 
for k < n, and YZ, (n) ^ VÉ p(n) and hence VEYE a a AYZ p,q] for all k > n); 
or (ii) have infinitely many elements, with infinitely many elements satisfying 
each condition, s4 (a) = a and s^ (a) Æ a (and thus A satisfies VS f(k) ^ Uw 
for all k € N). Note that the description is well-defined because an element must 
always satisfy either s4(a) = a or s4(a) Æ a, but never both or neither of these. 
The theory 7f is similar to Ty, but in addition to Ax(T;) its structures must 
also satisfy Yv. 


Remark 1. The construction of 77 from Ty is very similar to the general con- 
struction of Definition 6. However, the corresponding result, Lemma 5, accord- 
ing to which all properties but convexity are preserved by this operation, is only 
shown in Lemma 5 for cases where the original signature is empty, which is not 
the case for Ty. Obtaining TJ? from Ty is not done by adding a function sym- 
bol, but rather by changing the axiomatization of the already existing function 
symbol. While we do prove that TF has the required properties, a general result 
in the style of Lemma 5 for arbitrary signatures, with the ability to preserve an 
existing function symbol instead of adding a new one, is left for future work. 
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Example 7. Let An be a ©-model with domain {a1,..., an} such that: s^” (a;) 
equals a; if 1 < i < fi(n), and a, if fi(n) < i < n (the second condition may be 
void if n = 1). Then A, is a model of both Ty and ZF. 

If « is an infinite cardinal, let A, be a X.-model with domain AU {an:n € 
N\ {0}} (where A is a set of cardinality « disjoint from {an : n € N\ {0}}) such 
that s^% (a;i) = a; for each i € N \ {0}, and s^! (a) = a; for each a € A. Then 
A,, is a model of both Ty and TF. 


To show that Ty is smooth and finitely witnessable, we construct, given a 
T;-interpretation. another T;-interpretation by (possibly) adding two disjoint 
sets of elements to the interpretation, one whose elements will satisfy s(a) = a, 
and one whose elements will satisfy s(a) 4 a. 

To show that it is not strongly finitely witnessable, we use the following 
lemmas, which are interesting in their own right. According to the first, the 
mincard function of Tp is not computable. 


Lemma 7. The mincard function of Ty is not computable. 


The second lemma that is needed in order to prove that Jy is not strongly 
finitely witnessable, is quite surprising. As it turns out, for quantifier-free formu- 
las, the set of Ty-satisfiable formulas coincides with the set of satisfiable formulas. 
That is, even though the definition of Ty is very complex, it induces the same sat- 
isfiability relation, over quantifier-free formulas, as the simplest theory possible 
— the theory axiomatized by the empty set (or, equivalently, all valid first-order 
sentences). 


Lemma 8. Every quantifier-free 3's-formula that is satisfiable is Ty-satisfiable. 


Note that Lemma 8 does not hold for quantified formulas in general. For example, 
the formula V x. s(x) # x is satisfiable but not Ty-satisfiable: because f(1) = 1, 
every Ty-interpretation A must have at least one element a with s4(a) = a. 

Using Lemma 7 and 8, it is possible to show that Ty is not strongly finitely 
witnessable: 


Lemma 9. Ty is not strongly finitely witnessable. 


The idea of the proof of Lemma 9 goes as follows: assume for contradiction 
that there is a strong witness wit. The mincard function for Ty can then be 
defined as 


mincard(¢) = min{|V/E|: E € eqand wit(¢) A Ë is Ty-satisfiable}, (1) 


where eq is the set of all equivalence relations Æ on V = vars(wit(?)), being 
the corresponding arrangements denoted by ôE. Clearly, the sets V and eq can 
be effectively computed. Also, by Lemma 8, testing for the Ty-satisfiability of 
quantifier-free formulas is decidable. Together with our assumption that wit is 
computable, we get that the mincard function of Ty is computable, which con- 
tradicts Lemma 7. 

The arguments for TF are very similar, and require minor changes in the 
corresponding proofs for Tp. 
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Remark 2. We remark on the connection between the results regarding Ty and 
Tř, and those of [3]. What we show here is that Ty (TF) is polite but not 
strongly polite. Figure 1 of [3] summarizes the relations between these two prop- 
erties for the one-sorted case. It shows that polite theories that are axiomatized 
by a universal set of axioms, and whose quantifier-free satisfiability problem is 
decidable, are strongly polite. While Ty is decidable for quantifier-free formulas 
(this is a corollary of Lemma 8), its presentation here is definitely not as a uni- 
versal theory. On the other hand, [3] also shows that decidable polite theories 
for which checking if a finite interpretation belongs to the theory is decidable 
are also strongly polite. However, it is undecidable, given an interpretation, to 
check whether it belongs to Ty (and ZF): such an algorithm would lead to an 
algorithm to compute f as well. Thus, the theories Ty and TF are polite, but 
do not meet the criteria for strong politeness from [3]. And indeed, they are not 
strongly polite. 


4.4 Theories over Many-Sorted Non-empty Signatures 


For the last column of Table 1, all possible theories can be obtained from theories 
that were already defined, using a combination of Definitions 4 to 6, and so there 
is no need to present additional theories specifically for many-sorted non-empty 
signatures. 


Example 8. Line 1 includes the theory ((T>n)°)s, obtained from (Z>n)? using 
Definition 5, where the latter theory is obtained from Js, using Definition 4. 
This theory admits all properties, including convexity. To obtain a non-convex 
variant, the theory ((T>n)?)v is constructed in a similar fashion, using Definition 
6 instead of Definition 5. 


With many-sorted non-empty signatures, we can always find an example for 
each combination of properties, except for those that are trivially impossible due 
to Theorems 1 and 2 (i.e., theories that are strongly finitely witnessable but not 
finitely witnessable and theories that are smooth but not stably infinite). This 
is nicely depicted by Fig.6. Theorems 1 and 2 are represented in this figure by 
the location of the circles: the circle for smooth theories is entirely inside the 
circle for stably infinite theories, and similarly for strongly finitely witnessable 
and finitely witnessable theories. Then, for every region in this figure, the right- 
most column of Table 1 has an example, the sole exception being the region that 
represents unicorn theories. 


Remark 3. For non-empty signatures, we chose to include functions rather than 
predicates. This is not essential as we can replace function symbols by predicate 
symbols by including the sort of the result of the function as the last component 
of the arity of the predicate, and then adding an axiom that forces the predicate 
to be a function. 
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5 Polite Combination Without Smoothness 


Polite combination of theories was introduced in [10]. There, it was claimed that 
in order to combine a theory 7 with any other theory using polite combination, it 
suffices for T to be smooth and finitely witnessable (that is, polite). Later, in [6], 
this condition was corrected, and it was shown that in fact a stronger requirement 
is needed from T: it has to be smooth and strongly finitely witnessable (that is, 
strongly polite) to be applicable for the combination method. 

Given that weakening strong finite witnessability to finite witnessability 
results in a condition that does not suffice, it is natural to ask whether there is 
any other way to weaken the required conditions for polite combination. Rather 
than weakening strong finite witnessability to finite witnessability, here we con- 
sider another option: weakening the smoothness condition to stable infiniteness. 
Thus, the main result of this section is that polite combination can be done for 
theories that are stably infinite and strongly finitely witnessable, even if they 
are not smooth. 


Fig. 6. A diagram of the various notions studied in this paper. (Color figure online) 


Our contribution can be understood by viewing Fig. 6, ignoring the circle that 
represents convexity (a property unrelated to the current section). [6] shows 
that polite combination can be done for the purple region, which represents 
smooth and strongly finitely witnessable theories. [6] also presented an example 
showing that expanding the same combination method to the blue region, which 
represents smooth and finitely witnessable theories, results in an error. Here we 
instead expand polite combination to the red region, which represents stably 
infinite and strongly finitely witnessable theories. Now, the red region, if not 
empty, is only populated by unicorn theories (see Sect. 4). If such theories do not 
exist, the result follows immediately. Until this is settled, however, we provide a 
direct proof, regardless of the existence of unicorn theories. 

The next theorem shows that polite theory combination can be done for 
theories that are not necessarily strongly polite (smooth and strongly finitely 
witnessable), but rather that are simply stably infinite and strongly finitely wit- 
nessable. 
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Theorem 9. Let Xı and Xə be disjoint signatures with sorts Sı and S2; let Ty 
be a 31 -theory, Tz be a X2-theory, and T = Ti T2; and let ġı be a quantifier-free 
Xı-formula and hz a quantifier-free Xz-formula. 

Assume that To is stably-infinite and strongly finitely witnessable w.r.t. S = 
SıN S2, with strong witness wit. Let y = wit(¢d2), Vo = varse (Y) for every o € S 
and V = Ujeg varse(w). Then the following are equivalent: 


1. ġı A 2 is T -satisfiable; 
2. there exists an arrangement dy over V such that ġı A dv is Ti -satisfiable and 
Y A dy is Tz-satisfiable. 


It relies heavily on the following lemma, that proves that stable infiniteness and 
strong finite witnessability imply a weaker notion of smoothness. In this weaker 
notion, uncountable domains in the original structure A are reduced to countable 
ones, and the function «, that dictates the cardinalities of models, is assumed to 
never assign an uncountable cardinal to any of the sorts. 


Lemma 10. Let X be a signature with S C Ss, and T a theory over X. IfT isa 
stably-infinite and strongly finitely witnessable theory, both w.r.t. the set of sorts 
S, then: for every quantifier-free X-formula ġ; T -interpretation A that satisfies 
ġ; and function k from SA = {o € S : |o^| < w} to the class of cardinals such 
that |o^| < K(o) < w for every o € SA, there exists a T -interpretation B that 
satisfies ọ with |o” | = (0) for every o € SA, and |o | = w for every o € S\SA. 


The proof of Theorem 9 goes as follows: first, we make the infinite domains 
corresponding to shared sorts of a model A of ġı A dy at most countable, by 
applying Lemma 2. We then proceed similarly to the proof of the polite com- 
bination method in [6]: decrease a model B of w A 6g by using wit as a strong 
witness; and then make the cardinalities of the shared sorts in B equal those of 
A (which are at most countable), by using Lemma 10. 

This result greatly improves the state-of-the-art in polite theory combination, 
which requires proving that one of the theories is both smooth and strongly 
finitely witnessable. Thanks to this theorem, proving smoothness can be replaced 
by proving stable infiniteness, which is typically a much easier task. 


6 Conclusion 


As mentioned, there are two main contributions offered in this paper, both asso- 
ciated with the theme of theory combination. In Sect.4, we provide a table with 
examples for almost all the combinations of stable infiniteness, smoothness, con- 
vexity, finite witnessability, and strong finite witnessability known not to be 
impossible. Section 3 provides theorems proving the sharpness of the examples 
provided. The second contribution is a new combination theorem, according to 
which polite theory combination can be done without smoothness, provided we 
have instead stable infiniteness. 

Many ideas for future work rise from the studies here presented. A first 
direction would be to settle the question of whether unicorn theories exist: if 
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they do not, a proof would probably involve an interesting generalization of 
the upward Léwenheim-Skolem theorem for many-sorted logic and would imply 
that strongly polite theories are just simply stably-infinite and strongly finitely 
witnessable theories, thus greatly simplifying the proof of Theorem 9; if unicorn 
theories do exist, one wonders if they can be combined in some meaningful 
way. Another direction of future work involves considering other model-theoretic 
properties in our table, such as shininess, gentleness, flexibility, and so on, as well 
as the effect of taking proper subsets of sorts for signatures containing more than 
one sort. 
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Abstract. First-order logic fragments mixing quantifiers, arithmetic, 
and uninterpreted predicates are often undecidable, as is, for instance, 
Presburger arithmetic extended with a single uninterpreted unary pred- 
icate. In the SMT world, difference logic is a quite popular fragment 
of linear arithmetic which is less expressive than Presburger arithmetic. 
Difference logic on integers with uninterpreted unary predicates is known 
to be decidable, even in the presence of quantifiers. We here show that 
(quantified) difference logic on real numbers with a single uninterpreted 
unary predicate is undecidable, quite surprisingly. Moreover, we prove 
that difference logic on integers, together with order on reals, combined 
with uninterpreted unary predicates, remains decidable. 


Keywords: First-order logic - Decidability - SMT - Arithmetic - 
Uninterpreted predicates 


1 Introduction 


The success of satisfiability modulo theories (SMT) solvers in verification can 
be attributed to several things, but one of them is indisputably the omnipres- 
ence, in the combination of theories, of arithmetic reasoners. As SMT solvers 
get stronger in quantified reasoning, it becomes more interesting to get a clear 
picture of decidability frontiers when arithmetic is used in a quantified SMT 
context. Some pure arithmetic theories are already undecidable, even in their 
quantifier-free fragment, e.g., Peano arithmetic [12], i.e., a first-order theory 
of the natural numbers with addition and multiplication. However, Presburger 
arithmetic, somehow the linear restriction of Peano arithmetic, is decidable even 
in the quantified case [10], but augmenting Presburger arithmetic with a single 
unary uninterpreted predicate already yields undecidability [7,11,19]. To obtain 
a decidable fragment mixing arithmetic and uninterpreted predicates, one must 
further restrict the expressiveness. 

In the SMT world, difference logic used to be a popular fragment of arith- 
metic, because of its low complexity in the quantifier-free case. In this fragment, 
arithmetic is limited to difference constraints of the form x — y > c where x 
© The Author(s) 2023 
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and y are variables, c is an integer constant and ™ belongs to {<,<,=,>,>}. 
Difference constraints can, e.g., express conditions on the distance between two 
variables, the atomic formula «—y = 2 stating that the distance between the val- 
ues of x and y must be exactly 2. Notice that since difference constraints involve 
only two variables (c is an integer constant) those constraints are strictly less 
expressive than linear constraints in Presburger arithmetic. The decidability of 
the logic mixing difference constraints and unary uninterpreted predicates, when 
interpreted over N (or similarly Z) reduces to the decidability of the monadic 
second-order theory of one successor, usually referred to as S15. The decidability 
of S1S has been established thanks to the concept of infinite-word automaton [4]. 

On the real domain, it is well known that the first-order theory of real- 
closed fields, which is in a sense the real counterpart of Peano arithmetic, is 
decidable [20] even in the presence of quantifiers. Whereas this might give the 
impression that decidability is more often obtained on the reals than on the 
integers, we here prove that the logic mixing difference constraints and unary 
uninterpreted predicates, when interpreted over R, is undecidable. 

Further restricting the arithmetic language, and considering order on the 
real domain only, it is known that the monadic second-order theory of order is 
undecidable [9,17], but its universal fragment is decidable [5]. In this work, we 
establish that the fragment mixing unary uninterpreted predicates, difference 
constraints over integer variables, and order constraints over real variables is 
decidable. 

Section 2 provides some prerequisites and the precise definition of the stud- 
ied fragments. In Sect. 3, we prove the decidability of the fragment mixing unary 
uninterpreted predicates, difference constraints over integer variables, and order 
constraints over real variables. This was already the subject of a work-in-progress 
workshop paper [1]. In Sect.4, we prove that the fragment of quantified differ- 
ence constraints over real variables extended with a single unary uninterpreted 
predicate is undecidable. 


2 Preliminaries 


We refer to e.g., [8] for a general introduction to first-order logic with equal- 
ity, and assume that the reader is familiar with the notions of signature, term, 
variable, and formula. We use the usual logical connectives (V, A, 7, >, =) 
and first-order quantification Jx. y and Vx. p, respectively equivalent to writing 
Jz (p) and Vz (vy), i.e., the dot stands for an opening parenthesis that is closed 


at the end of the formula. Variable symbols are denoted by x, y, z, ... and are 
meant to be interpreted as real numbers. 
Our signature contains the interpreted arithmetic symbols 0, 1, +, —, <, <, 


>, >, =, and other constants in N that stand for terms 1+1+---+1. We 
furthermore use a monadic (i.e., unary) interpreted predicate x € Z to denote 
that x has an integer value. The signature also contains uninterpreted predicate 
symbols P, Q,... In the whole article, we only consider unary predicate symbols. 
Indeed, including binary uninterpreted predicates without restriction on first- 
order quantification directly yields undecidability. Our language is the set of all 
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well-formed formulas, in the usual sense, built using symbols from the signature. 
Further specific restrictions will be introduced later. 

An interpretation specifies a domain (i.e., a set of elements), assigns a value 
in the domain to each free variable, and assigns relations of appropriate arity on 
the domain to predicate symbols in the signature. Throughout the article, the 
interpretation domain is always R. The arithmetic symbols 0, 1, +, —, <, <, >, 
>, = are interpreted as expected on R, and x € Z is true if and only if x has 
an integer value!. An interpretation assigns an arbitrary subset of the domain R 
to each unary predicate. By extension, an interpretation assigns a value in R to 
every term, and a truth value to every formula. We denote the interpretation I 
of a variable x by I[x], and the interpretation of a predicate P by I[P]. A model 
of a formula is an interpretation that assigns true to this formula. A formula is 
satisfiable on a domain (here R) if it has a model on that domain. 


2.1 Difference Arithmetic with Unary Predicates 


We consider several fragments where the language is restricted, in particular in 
the way that the arithmetic relations can be used. A fragment is decidable if 
there exists a procedure to check whether a given formula in this fragment is 
satisfiable. 

In the various fragments introduced below, all arithmetic atoms are either 
order constraints of the form xy, or difference constraints of the form x—y X c, 
where x and y are variables, c is a constant in Z, and ™ € {<,<,=,>,>}. As 
a reminder, the language of our formulas only contains unary predicates. The 
only atoms besides the arithmetic ones are of the form P(x) where P is an 
uninterpreted predicate symbol and x is a variable, and x € Z where v is a 
variable. Note that the addition of constraints of the form x x c, where x is a 
variable and c is an integer constant, to fragments that already admit difference 
constraints does not increase their expressive power: constraints x œX c can be 
replaced by difference constraints x — vo >< c, where vo is a particular variable 
in Z intended to be interpreted as zero. Indeed, shifting an interpretation by a 
fixed integer j — i.e., the new interpretation of any variable x is the old value 
of x plus j, and the new value of any predicate P for a real number d+ j is the 
old value of P for d — preserves the assigned value of formulas in our fragments. 
Therefore any model where vo is an arbitrary integer can be shifted into a model 
where vp is zero. 

As syntactic sugar, conjunctions of order constraints will be merged to 
improve readability, i.e., we will often write z < y < z rather than x < yAy < z. 
Finally, we use the shorthand P(x + c) instead of Jy. y — x = c A P(y), where x 
is a free variable and c € Z. 

We now introduce our fragments of interest. Their names are inspired from 
the SMT-LIB nomenclature, where acronyms stand for the theories that appear 
in the combinations: 


1 In the current context, this choice of notation for mixed integer-real arithmetic is 
simpler than using a multi-sorted logic. 
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— UF1: the theory of uninterpreted functions, with the restriction that uninter- 
preted symbols may only correspond to monadic predicates; 

— RO: the theory of order on the reals only; 

— IRO: the theory of order on the reals and integers; 

— IDL: difference logic on the integers; 

— RDL: difference logic on the reals. 


UF1-RO. The fragment UF1-RO is the fragment with unary uninterpreted predi- 
cates and order constraints between variables interpreted over R. Difference logic 
constraints and atoms of the form x € Z are not allowed. 


Example: The formula Vedy,z.y < x < zAVt.(y<t<zAP(t))>t=2 
describes a predicate P that is true only on isolated real numbers. 

UF1-IRO. The fragment UF1-IRO is the extension of UF1-RO where atoms of the 
form x € Z are allowed. This fragment can express order relations between real 
and integer variables. 


Example: The formula Yz, y. (x <y Az €ZAy eZ) > w.x <v < y^ P(w) 
describes a predicate P that is true for at least one value located between any 
two integers. 
UF1-IDL-IRO. The fragment UF1-IDL-IRO is an extension of the fragment UF1-IRO 
(and therefore of UF1-RO). It is also interpreted over R. Order constraints between 
variables and atoms of the form x € Z are allowed. Additionally, difference logic 
constraints are allowed, but they can only involve integer-guarded variables. 

In order to enforce this integer-guard restriction on difference logic con- 
straints, UF1-IDL-IRO formulas must be well-guarded, i.e., difference logic con- 
straints can only appear in the two following contexts: 


tEZAyEeZaAur—yre, 
-(t@EZAyEZ)>ur-ynme, 


where x and y are variables, c € Z is a constant, and = € {<,<,=,>,>}. 
Example: The following formula describes a predicate that is either true on all 
odd numbers and false on all even numbers, or the opposite, as well as true on 
all non-integer numbers: 

Vx, y. (x Ee ZAyeE ZAy—x=2) = (P(x) Ply))] 

Alar, y.x € ZAy EZA P(x) AaP(y)] A [Wz.7(z € Z) > P(z)] 

UF1-RDL. The fragment UF1-RDL is the fragment interpreted over R, where order 
constraints, difference logic constraints and unary predicate atoms are allowed 
without any restriction. The use of atoms of the form x € Z is forbidden. Since 
order constraints are a special case of difference logic constraints, the name of 
the fragment only refers to RDL and not RO. 

Example: The formula Vz dy.0 < y — x < 3A P(y) describes a predicate P such 
that any subinterval of R of length greater or equal to 3 contains a value for 
which P is true. 
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Note: It might appear to the reader that a missing logic in this nomenclature 
is UF1-IRDL, with difference logic constraints on both real and integer variables. 
We will later show that UF1-RDL is already undecidable, so it makes little sense 
to introduce any extension of it. 


3 Decidability of UF1-IDL-IRO 


The fragment UF1-RO is actually a restriction of the universal fragment of the 
monadic second-order theory of the real order R, i.e., UF1-RO augmented with 
universal quantification of predicate variables. It has been established in [5] that 
the universal fragment of the monadic second-order theory of the real order R 
is decidable, which trivially implies the decidability of UF1-RO. We show here 
that its extension UF1-IDL-IRO (and therefore UF1-IRO) is also decidable, by a 
reduction to UF1-RO. 


Theorem 1. UF1-IDL-IRO and UF1-IRO are decidable. 


Note that the decidability of UF1-IRO is a direct consequence of the decidabil- 
ity of UF1-IDL-IRO, since UF1-IDL-IRO is an extension of UF1-IRO. The remaining 
of this section is thus dedicated to proving that UF1-IDL-IRO is decidable. 


3.1 Recognizing Integer Values 


We first show how to define in UF1-RO a predicate Pint over R that is <- 
isomorphic to Z, i.e., such that there exists a bijection between the sets described 
by Pint and Z that preserves the order relation over their elements. Integer guards 
in UF1-IDL-IRO will later be translated using P;,,;. Intuitively, an integer-guarded 
variable in a UF1-IDL-IRO formula will correspond to a variable taking its value 
in the set described by Pint in the translated UF1-RO formula. 

We axiomatize Pint in UF1-RO as follows: 


e Every element of Pint is isolated: 
YrIJy,z. y <x <z AYt. [y <t< zA Pins(t)] > t=. 


e Every point in R has a unique successor in Pjn;: 
Vedy.2<yA Pintly) AVt. £ <t <y> 7Pini(t). 


e Similarly, every point in R has a unique predecessor in Pin¢: 
Very. y <a Pintly) AYt. y <t <a => Pini (t). 


The set of all integers is a model for Pint, therefore the above axiomatization 
is consistent. The set of elements satisfying Pint is necessarily infinite and does 
not admit a maximal or a minimal element. This is a direct consequence of the 
successor and predecessor axioms. More interestingly, this set is also necessarily 
countable. Indeed, since each point is isolated, there exists an application that 
maps the elements satisfying Pint to disjoint open intervals. Any set of disjoint 
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intervals in R with non-zero length is necessarily countable [18], since each of 
them contains a rational value that does not belong to the others. 

It is now possible to define a successor relation on the real numbers satisfying Pint 
with the formula Suce(x, y) = Pint(x) A Pint(y) Ny <r AYz.y<z<£ => 7Pini(z), 
i.e., x is the successor of y, or equivalently, y is the predecessor of x. 

The axiomatization of Pint is, in fact, precise enough to have the following lemma. 


Lemma 1. For any model M of Pins, the set M|Pint] is <-isomorphic to Z. 


For convenience in the proof, we define Oing as an arbitrary existentially 
quantified value that belongs to the set described by Pint- 


Proof. Given a model M of the axiomatization of Pint, we need to define a 
bijection between the set M|Pin:] and Z that preserves order. 

Let us define an application f from M[|Pin:] to Z. We set f(Oinz) = 0, and 
then define recursively: 


— f(y) = f(x) + 1 for each x,y € M [Pint] such that y > Oint and Succ(y, x), 
— f(y) = f(x) — 1 for each x,y € M[|Pint] such that y < Oing and Succ(a, y). 


Thanks to the fact that every element of M[|Pint] has a unique predecessor 
and successor, it follows that f ranges over the whole set Z, proving that f is 
surjective. Since it is clear that f preserves order, it follows that f is strictly 
increasing, and therefore injective. It remains to show that f is well defined for 
every element in M[Pinz]. 

If there exists some element y € M[Pin:| for which f is not defined, it means 
that f is not well-defined, in the sense that there exists either an element y > Ojnz 
such that the interval [0jnz, y] contains an infinite number of elements satisfying 
Pint, or there exists an element y < Oint such that the interval [y, Oint] contains an 
infinite number of elements satisfying Pint. Since both cases are symmetric, we 
only address the former. There must exist a strictly increasing infinite series of 
elements in M [Pint] bounded by y. Let us consider its limit z € R. Because there 
must exist an element of M[P;,;] smaller than z and arbitrarily close to z, it 
follows that z cannot have a predecessor, which contradicts an axiom. Therefore 
f is well-defined, and every element of M [Pj] is associated to an integer number. 
The application f is therefore a bijection. 


3.2 Translating Formulas 


We are now able to describe the satisfiability-preserving translation of formulas 
from UF1-IDL-IRO to UF1-RO. Consider a UF1-IDL-IRO formula y. Without loss of 
generality, we assume that Pins does not appear in y. The translation of vy is 
defined as 


where AXIOMS ;n:(Pint) is the conjunction of the axioms of Pins and [-] is a 
translation operator. This translation operator [-] distributes over all Boolean 
operators and quantifiers, and corresponds to the identity transformation for 
most considered atoms, except in the following cases: 


548 B. Boigelot et al. 


~ [x € Z] = Pine(x); 
~ [x— y r e] = 5z0,... ze. (Y = 20) A (£ Od ze) A Agcice Suce(Zi41, Zi), 
for c € N and m € {<,<,=,>,>}. We assume that 2,...2- are fresh vari- 


ables w.r.t. x and y. 


Example: [a — y < 2] = 320, 21, 22. Y = 20 A Succ(21, zo) A Succ(z2, 21) AT < 22. 
Notice that we only deal with the case c€ N since every atom of the form z—y M c 
with c € Z\N and m% € {<, <, =, >, >} can be rewritten as y — x ù —c with the 
following correspondences: (™, I) E€ {(=, =), (<, >), (>, <), (>, <), (<, >)}- 


3.3 Establishing Equisatisfiability 


Given a UF1-IDL-IRO formula y, the translation that we have introduced generates 
a corresponding UF1-RO formula Y. To establish that they are equisatisfiable, we 
need to prove that if y admits a model, then w also admits one, and reciprocally. 


Lemma 2. Given a UF1-IDL-IRO formula y, consider its translation into UF 1-RO 
w= AXIOMS ini (Pint) A [vy]. The formulas p and w are equisatisfiable. 


Proof. If p is satisfiable, let M be one of its models. Then, since w shares the 
same free variables and predicates than y with the only addition of Pint, we 
can directly construct a model M’ of w that is similar to M for the shared 
variables and predicates, and that interprets Pin, so that Pjn:(2) holds whenever 
x € Z. This is always possible since the only constraints on Pint generated by 
the construction of Y% are the axioms stated above. 

If w is satisfiable, then there exists a model M of w. Let us construct a 
model M’ of y. Let Oing E€ R be an arbitrary element of M[Pin:|. We define an 
automorphism g of R, such that g(Oint) = 0, and recursively g(y) = g(x) + 1 for 
x,y E M[Pini], Y > Oine and Succ(y,x), and gly) = g(x) — 1 for x,y € M[Pint], 
Y < Oing and Succ(x,y). The automorphism g maps each open interval between 
the k-th and (k + 1)-th successors (resp. predecessors) of Oing in M[Pinz], onto 
the open interval (k, k + 1) (resp. (—(k+1), —k)) while preserving order. 

M’ is defined by M’[a] = g(M[z]) for each free variable x of the formula 
p, and M'[P] = {g(x)|x € M[P]} for each uninterpreted predicate P of y. 
No unary predicate atom can be violated by M’ by definition. Furthermore, no 
order constraint can be violated by M’ either since g preserves order. Regarding 
the difference logic constraints, the intermediate variables z; introduced in the 
translation are necessarily mapped to values in M[P;nt] since the Succ relation 
enforces this property. Hence for each such variable, we have g(M[z;]) € Z. 
Intuitively, this ensures that in M’ the difference between the values taken by 
the integer variables is consistent with the difference logic constraints. It follows 
that M’ is a model of . 


4 Undecidability of UF1-RDL 


The result presented in the previous section establishes a lower bound for the 
decidability of our family of fragments. A natural follow-up problem is to estab- 
lish a corresponding upper bound, i.e., to find an extension of this logic that 
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yields undecidability. We show here that, when combined with uninterpreted 
unary predicates, as soon as difference logic constraints on reals are allowed, the 
logic becomes undecidable. 

We actually show a stronger result which is that a single unary predicate sym- 
bol is enough to yield undecidability. More precisely, we establish the undecid- 
ability of the restriction of UF1-RDL where only one predicate symbol is allowed, 


by reducing the halting problem of a Turing machine to the satisfiability problem 
over this restriction of UF1-RDL. 


Theorem 2. Satisfiability is undecidable for UF1-RDL with a single predicate. 
Corollary 1. Satisfiability is undecidable for UF1-RDL. 


The remaining of this section is dedicated to proving Theorem 2. We consider 
w.l.o.g. Turing machines defined over an alphabet with only two symbols and no 
explicit blank symbol [16]. This choice leads to a simpler proof. 


4.1 Definitions 


The proof is by reduction from the halting problem for a Turing machine with 
a single bi-infinite tape, starting from a blank tape (i.e., a tape filled with the 
symbol 0). Consider a Turing machine M = (Q, »’,q7,qr, A), where 


— Q is a finite nonempty set of states, 

— X is the alphabet {0,1}, 

— qr € Q is the initial state, 

— qr € Q is the halting state, 

- AC {(Q\{ar})x Vx Qx Dx {L, R}} is the transition relation, assumed to 
be total over its first two components, i.e., for any pair (q, œ) E€ (Q\{gr}) x X, 
there exists a tuple (q,a,q',a’,r) E€ A. 


A configuration C of such a Turing machine is a triplet containing the current 
state q, the content of the tape t € {0,1}% and the position of the head h € Z. 
Since the machine starts from a blank tape, the initial configuration is Co = 
(qr, 02, 0). 

A run p of length n € N (resp. n = +00) of such a Turing machine is a finite 
(resp. infinite) sequence of configurations (Cj) ;e[o;n) (resp. (Ci)ien), Such that for 
any two consecutive configurations C; = (qi, ti, hi) and Cis. = (@i+1, tiga, hist) 
there exists a transition (q, œ, q’, œ’, A) E€ A such that: 


- q= qi and J = qi+1, 

— tilhi] = a, i.e., the tape cell at position h; contains the symbol a, 
-= tii [h] = 0", 

= ti+ı[k] = tilk], for every ke Z, k # hi, 

= hia = hi +1 if à= R, and hign = hi- 1 if à= L. 


A halting run is a finite run such that the state of its last configuration is the 
halting state qF. 
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4.2 Encoding Runs 


Our goal is to encode a run of a Turing machine (as described before), i.e., encode 
the state, the tape content, and the position of the head for each configuration 
of such a run. Starting from the initial configuration, we must also ensure the 
coherence of the run w.r.t. the Turing machine transition relation, by connecting 
every two consecutive configurations. Our idea is to define an infinite sequence 
of intervals on the real line, such that each interval contains the encoding of its 
corresponding configuration (i.e., the first interval will contain the first configu- 
ration of the run, and so on). Difference constraints can then be used to connect 
consecutive configurations. 

Let N = [log,(|Q|)]. Each state q E€ Q of M can therefore be uniquely 
encoded with N Boolean values bł, ... b4. We want to encode consecutive con- 
figurations of the Turing machine using a single predicate P over R. In order to 
do so, we first need to describe a subset of R that will act as a grid supporting 
the encoding of the state, the tape content, and the head position of the current 
configuration. 

We use the concept of linear ordering [15] to describe the shape of the grid. 
A linear ordering J is a totally ordered set, i.e., a set equipped with a binary 
relation < which is irreflexive (for all j in J, j £ j), asymmetric (for all j,k in 
J, if j < k, then k £ j), transitive (for all i,j,k in J, if i < j and j < k, then 
i < k), and complete (for all j,k € J, either j = k, j < k, or k < j). The order 
type of a linear ordering J is the class of all linear orderings <-isomorphic to J. 
The order types of a singleton, the set composed of the N first natural numbers, 
N, and Z are respectively denoted by 1, N, w, and Ç. The concatenation of two 
linear orderings J and K (where their associated order relations are respectively 
<y and <x) is denoted by J+ K. It corresponds to the linear ordering composed 
of the set of pairs {(j,1)| j € J} U{(k, 2) | k € K}, and equipped with the order 
relation <, defined by (71,1) < (j2,1) if jı <J j2, (k1,2) < (ke, 2) if kı <x ko, 
and (j,1) < (k,2) for every j € J and k € K. More generally, given two linear 
orderings J and K, the linear ordering (J)* is the set of pairs (j,k) with j € J 
and k € K, with the order relation < such that (j1,k1) < (j2, k2) if either 
ky <x ko, or kı =K kg and jı <J j2. These operators are naturally extended on 
order types. For instance, the order type (w)” is the class of all linear orderings 
<-isomorphic to N?. 

The grid we consider is a linear ordering that is a subset of R, of order type 
(N +¢€+1+ a‘ An ordering of order type N + + 1 + Ç within the interval 
(0,3) is depicted in Fig. 1. Each dot corresponds to a natural number and each 
vertical line corresponds to an element of the linear ordering. The first N points 
will support the encoding of a state. The first subordering that is <-isomorphic 
to Z (i.e., of order type Ç) will be used to encode the position of the head, while 
the second one will support the encoding of the tape content. The whole grid is 
composed of an infinite repetition of the subordering N+¢+1+¢ (ie., it is 
repeated on the intervals [3k,3k + 3) for all k € N), hence the w exponent. 
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Complete encoding of one configuration 
poe 


Fig. 1. A visual representation of a linear ordering of order type N+-¢+1+4¢. 


4.3 Defining the Support of the Encoding 


Let us first define concretely the support of the encoding of the Turing machine 
configurations. The difficulty lies in describing the grid using a single predicate 
P, without meddling with the actual encoding of the configurations afterwards. 
Our solution is to characterize the points that belong to the grid by enforcing 
that such a point is surrounded by an open interval where P is uniformly true 
on the left, and by an open interval where P is uniformly false on the right, such 
as depicted in Fig. 2. We do not specify yet how P behaves on x, as this is how 
the configurations will actually be encoded later. 


P x AP 


Lk 


Fig. 2. The real number z belongs to the grid, since it is surrounded by a true (black) 
open interval on the left, and a false (white) open interval on the right. 


Such a characterization is easy to express in our restriction of UF1-RDL: 


Support(x%) = (Sy. y<aAVz.y<z<a => P(z))A(Ay.u<yAVz.4<z<y => AP(z)) 


Let us now partially axiomatize the predicate P such that the set of support- 
ing points constitutes a linear ordering of order type (N +¢414+¢ i 


(a) Let O be a variable and 1,2 and 8 be respectively the +1-successor of 0, 1 
and 2: 
Axiom; = (1=041)A(2=14+1)A(3=2+41) 

These free variables are implicitly existentially quantified in the final for- 
mula. 

Notice that the variable 0 can be interpreted as any real value, which only 
acts as a landmark for the beginning of the grid. 

(b) 0, 1 and 2 are supporting points: 

Axiom, = Support(0) A Support(1) A Support(2) 

(c) P is uniformly true before 0, i.e., there are no supporting points before 0: 

Axioms = Yx. x <0 => P(x) 
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There are exactly N — 2 supporting points within the interval (0,1): 
Axioms = Jz1, £2,... £N. £% =OAtN=1 

A Ni<icn (0 < £i < 1A SUCCSupp(Ti+1, x) 
where Succsupp(z,y) is a formula that states that x is the first supporting 
real value that is strictly greater than y, i.e., x is the successor of y on the 
grid. It is defined as follows: 
Succsgupp(@,y) = y < x A Support(x) A Support(y) A Vz.y <z <x => 7 Support(z) 
We also define an analogous formula to express that x is the predecessor of 
y: Predgupp(x, Y) = Succgupp(y, £). 
The set of supporting points within (1,2) is <-isomorphic to Z. This is 
done similarly to the axiomatization of Pint (cf. Section 3.1). But because 1 
(resp. 2) is a supporting point, there must exist a uniformly false (resp. true) 
interval of P at its right (resp. left) where no other supporting points can 
appear. All the supporting points will therefore be constrained to appear 
within a smaller interval (61,62) with 1 < bı < bo < 2, as illustrated in 
Fig. 3. 


Axioms = [Hb1, b2. 1 < bı < b2 < 2] (1) 
A [Wa. (by < z < bo) > Jy. x < y < bz A Support(y) 

AYz.£ < z < y = 7Support(z)| (2) 
A [Vz. (by < x£ < bg) > Jy. bı < y < x A Support(y) 

AVz.y < z < £x => 7Support(z)| (3) 

Vr. (1 < x < 2A Support(x)) > bı < x < bə] (4) 


This axiom can be broken down into these elementary pieces: 

(1) there exists an open interval (b1, b2) such that 1 < by < bə < 2, 

(2) each real value in (b1, b2) has a supporting successor, 

(3) each real value in (b;,b2) has a supporting predecessor, 

(4) there are no supporting points within (1, b1), nor within (b2, 2). 

The pattern of supporting points within (1,2) is repeated onto the interval 
(2,3) with an exact offset of 1: 

Axioms = VYx.1 < x < 2 > (Support(x) = Support(x + 1)) 

The pattern of supporting points within [0,3) is repeated onto every interval 
[3k, 3k + 3) for k € N: 

Axiomy = Vx. x > 0 => (Support(x) = Support(x + 3)) 


Notice that for Axiomz, it is not enough that a similar pattern appears within 


each interval [3k, 3k + 3): there must be an exact offset of 3 with the previous 
interval. This is mandatory to connect two consecutive configurations and ensure 
that they are coherent with the transition relation of the Turing machine, as 
defined later. The same goes for Aziomg, where the exact offset of 1 will allow to 
connect the position of the head to the tape content within a single configuration. 
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The formula AXIOMS supp = A Axiom, axiomatizes the predicate P. 
1<k<7 


ee ome 


1 bi b2 2 


Fig. 3. The points of the grid surrounded by open true (black) and false (white) 
intervals within (1, 2). 


Sı S2 S83 S4 SN-1 SN 
—oo 0 1 


Fig. 4. A model for the axiomatization of P over the interval (—co, 1). 


Lemma 3. The formula AXIOMS supp is consistent. 


The proof sketch below provides the key ideas to construct a model of 
AXIOMS supp. The complete construction is described in [2]. 


Proof. Let us construct a subset S of R that is a model of AXIOMS supp- Firstly, 
we make every negative number belong to S, which ensures that there do not 
exist negative supporting points. The interval [0,1] is then cut into 2N — 2 
intervals of equal length, which alternate between being included in S, and being 
disjoint from S. This ensures the existence of exactly N — 1 supporting points 
within the interval (—oo, 1), 0 being the first; 1 will be considered later. These 
N — 1 supporting points are referred to as s1,59,...Sy_— and are depicted in 
Fig.4. Recall that the supporting points are exactly those surrounded by an 
interval of S' (i.e., black on the figure) on the left, and an interval disjoint from 
S (i.e., white) on the right. 

In order to make the real value 1 the N-th supporting point, it is enough 
to make an interval on its right disjoint from S, e.g., the interval (1,1 + 4). 
Symmetrically, we make the interval (2 — L, 2) included in S, satisfying the left 
part of the requirement for the real value 2 to be a supporting point. 
We further characterize S such that the set of supporting points within the 
interval (1+ L, 2— F) is <-isomorphic to Z. This can be done by partitioning the 


Fig. 5. A model for the axiomatization of P over the interval (1, 2). 
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open interval (1+ L, 2— ł) into a bi-infinite sequence of open intervals alternating 
between being included and disjoint from S, as depicted in Fig. 5. 

The whole pattern described on the interval (1,2) can be directly transposed 
onto the interval (2,3) with an exact offset of +1. Similarly, the distribution of 
S over the interval (0,3) can be transposed onto every interval (3k, 3k +3) with 
an offset of +3k, for k > 0. The only real values for which we do not describe 
their relation with S are the points surrounded by an interval included in S on 
one side, and an interval disjoint from S on the other side. These points never 
conflict with the axiomatization AXIOMS supp which only deals with non-empty 
open intervals. 

By construction, S' satisfies each axiom of the formula AXIOMS supp, and is 
therefore a model of this formula. 


4.4 Encoding a Configuration of the Turing Machine 


Now that the supporting grid has been properly defined, the actual encoding of 
a given configuration can be addressed. That is, the state, the tape content and 
the head position of the (k + 1)-th configuration of a run are encoded on the 
supporting points contained within the interval [3k, 3k + 3). 


Encoding the State. Encoding the state of a given configuration is rather 
direct since we defined the grid to contain N consecutive supporting points 
within every interval [3k,3k + 1] for k € N, that can support the encoding 
of a state. We only need to indicate that we start reading the encoding on a 
multiple of 3. However the logic UF1-RDL does not allow to express periodicity 
constraints on variables. Nevertheless, thanks to our axiomatization, 0 and every 
other positive multiple of 3 are the only points that simultaneously have no 
supporting predecessor, while admitting a supporting successor. These properties 
are expressible as follows: 

NoPredsupp(«) = Vz. (z < x A Support(z)) > dy.z < y < x A Support(y) 
HasSuccsupp(v) = 3z. x < z A Support(z) A Yy. x < y < z => =Support(y) 

For convenience, we introduce the formula EncodingBegins to characterize a real 
value x on which the encoding of a state starts: 

EncodingBegins(x) = Support(x) \ NoPredsupp( £) \ HasSuccgupp(x) 
Furthermore, the formula State, expresses that a state q € Q is encoded on a 
given real number x and its N — 1 supporting successors: 


Stateg(x) = EncodingBegins(x) \dyi,...yn.- £ = yı 


A N Succsup(yisi.ys)\ A Ply) = 08 
1<i<N 1<i<N 


where P(y;) = b? is a shorthand for P(y;) if b? = T, and ~P(y;) if b? = L. 


Encoding the Head Position. The position of the head is encoded in the 
second part of the grid, that is, in the interval (3k + 1,3k + 2) for the (k + 1)-th 
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configuration (cf. Fig. 1). The grid on this interval is <-isomorphic to Z. Each 
element of this subordering will correspond to a position of the tape. When the 
predicate P is true at such a point, it means that the head points towards that 
cell. Since the Turing machines that we consider here have a single read/write 
head, it must point towards a unique cell for each configuration. Therefore P 
must be true only for a single element of that subordering. 


Encoding the Tape Content. Similarly, the tape content is encoded in the 
third part of the grid, that is, in the interval (3k + 2,3k + 3) for the (k + 1)-th 
configuration (cf. Fig. 1). Again, the grid on this interval is <-isomorphic to Z. 
And again, each element x of this subordering will correspond to a cell of the 
tape, matching the cell that corresponds to x — 1 in the head position interval. 
Figure 6 illustrates the connections between the suborderings, within a single 
configuration and with the next one. The idea of the encoding is to simply set 
the value of P to true on the elements of the subordering that correspond to 
cells containing a 1, and to false for cells containing a 0. 


Fig. 6. The first two consecutive configuration encodings. 


4.5 Enforcing a Valid Run 


Let us now define formally the formulas characterizing an accepting run of M. 
We will decompose the global formula into three main parts: the initial con- 
ditions START m, the conditions on the transitions STEPm and the halting 
condition ENDm. For the sake of clarity, we use capital letters for these higher- 
level formulas. 

The initial conditions of M are that the state encoded on 0 and its N — 1 sup- 
porting successors is the initial state qo, that the head points towards a unique 
initial unspecified cell of the tape, and finally that the tape is initially filled with 
0’s. These conditions are expressed by the following formula: 


START = Stateg,(0) A [Ay.1 < y < 2A Support(y) A P(y) 
AVa. (1 < x < 2A Support(x) A P(x)) > < = y] 
^ [Vy. (2 < y < 3A Support(y)) = =~P(y)] 
The requirements on the transition are more complex. Intuitively, if before 


reaching the step i € N, we have not yet encountered the halting state qr, 
then we must ensure that the configuration at Step i can be obtained from the 
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configuration at the previous step i — 1 by following a transition (q, a, q’, œ’, A) € 
A. The overall formula for this condition is the following: 


STEPm = Yy. (y > 0A EncodingBegins(y) A NotEndedm (y)) 
> Jxz.y = x + 3^ Transition (x, y) 


The subformula NotEndedm(y) expresses that no valid real value prior to y 
(i.e., a positive multiple of 3 strictly smaller than y) encodes the halting state. 
This formula is defined by: 


NotEndedm(y) = Va. (x < y A EncodingBegins(x)) = 7(Stateg, (x)) 


The subformula Transition,,(2,y) expresses that there exists a transition 
(q,a, q,a’, ÀA) E A that allows to move in one step from the configuration 
encoded at x (i.e., that the encoding of the configuration starts exactly on x), 
to the configuration corresponding to y. To improve readability, we decompose 
the condition on the transition relation as follows: 


Transitionys (x,y) = 
[Stateg(x) A Stateg (y) A Tapea a (x,y) ^A Heady (x, y) 
(q,a,9q/,a',AJEA 


For a given transition (q,a,q',a’,) € A, the conditions on the states, tape 
and head are expressed as follows: 


— The state q must be encoded on the real value x, and the state q’ on y: 
Stateg(x) A Statey (y) 

— The tape must contain a € {0,1} at the position of the head for the step 
corresponding to x. Additionally, for the step corresponding to y, the tape 
must contain a’ at the previous position of the head, and remain unchanged 
at all other positions. 


Tapea w (x,y) = [Vz. (£ +1 <z<2+2A Support(z) A P(z)) 
=> P(z+1)=a^P(z+4)=a'] 
A[Yz. (£x +1 < z< g+ 2A Support(z) A =P(z2)) > (P(z +1) & P(z+4))] 


where P(z + k) = a is a shorthand for Ju. u = z + k A P(u) if a = 1, and 

du.u=z+kA 7P(u) if a = 0. The “+ 1” operator allows us to connect the 
encoding of the head position with the encoding of the tape content within 
the same configuration. The “+ 4” operator does the same while jumping to 
the next configuration (cf. Fig. 4). Notice that this formula does not involve y; 
it assumes (rightfully, given the formula STEP m) that the equality y = «+3 
holds. 

— The head is moved in the direction specified by ÀA € {L, R}, i.e., left for L and 
right for R. This can be expressed by exploiting the predecessor and successor 
relations defined for supporting real values. 


Heady (x,y) = Vz. (x +1 <z < xz+2^ Support(z) ^A P(z)) 
=> dv. falv, z+ 3) A P(w) A=P(z) 
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where fr = Succgupp and fr = Predsupp. Since in the initial configuration 
of the Turing machine the head points towards a single cell, the formula 
Head) ensures that this remains the case throughout every run of the Turing 
machine. 


Finally, the existence of a halting run is expressed by the formula: 


ENDm = Ac. Stateg, (2) 


The global formula that expresses that the Turing machine M halts on some 
run encoded by the value of the predicate P is the following: 


HALT, = START A STEP A ENDm ^ AXIOMS supp 


where AXIOMS supp is the axiomatization of the supporting points as described 
in Sect. 4.3. 

By construction, satisfiability of the global formula HALT m is equivalent to 
the existence of a halting run for the Turing machine M. It follows that the 
satisfiability problem for UF1-RDL is undecidable, which proves Theorem 2. 


5 Conclusion 


This work provides a lower and an upper bound for the decidability of first- 
order fragments with quantifiers mixing uninterpreted unary predicates and weak 
forms of real arithmetic. This draws a precise picture of the frontier of decid- 
ability in fragments mixing real arithmetic and uninterpreted predicates. 

We proved the decidability of the fragment UF1-IDL-IRO, where uninterpreted 
unary predicates, order constraints between real and integer variables, and dif- 
ference logic constraints between integer variables are allowed. This result is 
a consequence of the already established decidability of its restriction UF1-RO, 
where only uninterpreted unary predicates and order constraints between real 
values are allowed. To the best of our knowledge, there does not exist yet a 
practical decision procedure for UF1-RO. 

There exist fragments of arithmetic that are more expressive than difference 
logic, but still weaker than full Presburger arithmetic. It would be interesting 
to investigate if decidability for these is preserved in presence of uninterpreted 
unary predicates. Note however that our proof of decidability strongly relies on 
the translation of the constraints into the first-order theory of order over R, with 
unary predicates. This translation is not suitable for, e.g., constraints of the form 
x+y 0, where x and y are variables, and %1 € {<,<,=,>,>}. 

In another result, we established the undecidability of the fragment UF1-RDL, 
where uninterpreted unary predicates and difference logic constraints between 
real variables are allowed. It is worth mentioning that this result can be adapted 
straightforwardly to the same logic interpreted over the domain Q. 

Our long term goal is to design an effective decision procedure for the decid- 
able fragment. Complexity results have been established [6,13, 14] for the tempo- 
ral logic counterpart of the theory of order, to which we reduce the decidability of 
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our fragment of interest. We are currently designing a decision procedure relying 
on the concept of automata on linear orderings introduced in [3]. We hope that 
the insight we obtained through this decision procedure will eventually guide the 
design of new powerful instantiation techniques for SMT in a more expressive 
context, and that these techniques will happen to be complete in particular for 
this decidable fragment. 


Acknowledgments. We are thankful to Tanja Schindler and the reviewers of this 
paper and of our previous work-in-progress workshop paper for their comments. 
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Abstract. Rewriting Modulo SMT combines two powerful automated 
deduction techniques (1) rewriting and (2) SMT-solving. Rewriting 
enables the specification of behavior of systems using rewriting rules, 
while SMT theories specify system properties. Rewriting Modulo SMT 
is enabled by combining existing tools, such as Maude and SMT solvers. 
Search algorithms used for carrying out Rewriting Modulo SMT, how- 
ever, cannot exploit the incremental solving features available in SMT 
solvers as they are based on breadth-first search. This paper addresses 
this limitation by proposing Incremental Rewriting Modulo SMT The- 
ories, which is a syntactical restriction to rewriting rules. This restric- 
tion turns out to naturally be used in several applications of Rewriting 
Modulo SMT, including the verification of algorithms, cyber-physical 
systems, and security protocols. Moreover, we propose a Hybrid-Search 
algorithm for Incremental Rewriting Modulo SMT Theories that com- 
bines breadth-first search and depth-first search, thus enabling incre- 
mental SMT-solving. We demonstrate through a collection of existing 
benchmarks that the Hybrid-Search algorithm can achieve a 10 times 
performance improvement in verification times. 


1 Introduction 


Rewriting modulo SMT [14] is the result of the combination of two powerful 
automated deduction methods: rewriting logic and SMT-solving. It is supported 
by the integration [11] of powerful tools, such as Maude [6] and Z3 [8]. During 
rewriting, a set of constraints on the symbols appearing in a term are generated. 
These constraints can be, for example, non-linear arithmetic constraints that 
specify possible values that can be assumed by the configuration parameters. 
Demonstrating properties of such specifications amounts to search using these 
rewrite rules and satisfiability checking of the accumulated constraints using 
SMT solvers. Rewriting modulo SMT has been successfully applied in several 
case-studies from several domains, including safety of cyber-physical systems 
(CPSes) [13]; verification of algorithms [2]; and for network security analysis [16]. 

One important aspect that has not been addressed until now is how to exploit 
an SMT solver’s capability of incrementally solving problems. In this solving 
method, instead of checking for the satisfiability of a formula from scratch, it re- 
uses data previously computed by prior checks. For example, if the satisfiability 
© The Author(s) 2023 
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of a formula b has been checked, the check on b A bz may re-use the intermediate 
results obtained while checking for the satisfiability of b. It has been shown 
that incremental solving can greatly improve performance by a factor of 2-5 
times [10]. t 

The search algorithms used to implement rewriting modulo SMT are similar 
to those implemented in the Maude search engine [6]. They use a breadth-first 
search (BFS) algorithm with memoization techniques in order to improve per- 
formance. This type of search seems incompatible with incremental solving as 
constraints appearing in different branches of the search tree are generated under 
different conditions. Thus, it is hard to define what the increment (by mentioned 
above) would be. 

This paper’s goal is to enable rewriting modulo SMT that can exploit incre- 
mental solving. To achieve this, we make the following contributions: 


— Incremental Rewriting Modulo SMT by identifying a class of rewrite 
rules that are amenable to incremental solving. More specifically, rewrite rules 
are applied to terms containing symbols paired with a set of boolean terms 
constraining the values of these symbols. Moreover, any rewrite rule can only 
add new constraints, i.e., not change the existing set of constraints on the term 
that is being rewritten. We show that a variety of theories used in published 
case studies can be seen to be amenable to incremental solving. 

— A Hybrid Search Algorithm for Incremental Theories which combines 
breadth and depth-first search (DFS) strategies. The combination is param- 
eterized by a level of depth parameter which specifies how many depth-first 
search steps shall be performed before switching to a breath-first search. The 
proposed hybrid search algorithm enjoys the benefits of BFS, namely better 
coverage as it alternates through different branches of the search tree, and 
the benefits of DFS, namely incremental solving. 


We carried out a collection of experiments (the case studies mentioned above) 
on algorithm verification, cyber-physical systems verification, and network secu- 
rity analysis. The experiments show that in all these benchmarks, the hybrid 
search algorithm outperforms current BFS techniques, in some experiments 
achieving a 10 factor performance improvement. 

Section 2 illustrates the problems of existing BFS methods for Rewriting 
Modulo SMT and proposes Incremental Rewriting Theories which formalizes the 
notion of increments. Section 3 describes the Hybrid algorithm proposed illus- 
trating how it enables incremental SMT solving. Section 4 describes experiments 
that compare different search mechanisms (BFS, DFS, and Hybrid) on existing 
benchmarks from the literature. Finally, we conclude by discussing Related Work 
in Sect.5 and Future Work in Sect. 6. 


2 Incremental Rewriting Modulo SMT 
Rewriting logic [12] is a logical formalism that is based on two ideas: states of 
a system are represented as elements of an algebraic data type, specified in an 


1 Albeit, incremental solving can also reduce performance depending on the theories 
that are used. 
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equational theory, and the behavior of a system is given by local transitions 
between states described by rewrite rules. A rewrite rule has the form t — t’ if 
b, where t and t’ are terms possibly containing variables and b is a condition (a 
boolean term). Such a rule applies to a system in state s (a ground term) if t can 
be matched to a part of s by supplying the right values for the variables, and if 
the condition b holds when supplied with those values. In this case, the rule can 
be applied by replacing the part of s matching t by t’ using the matching values 
for the variables in t’. 

Maude is a language and tool based on rewriting logic [6]. Maude provides 
a high performance rewriting engine featuring matching modulo associativity, 
commutativity, and identity axioms; and search and model-checking capabilities. 
Thus, given a specification S of a concurrent system, one can execute S to find 
one possible behavior; use search to see if a state meeting a given condition can 
be reached; or model-check S' to see if a temporal property is satisfied, and if 
not, to see a computation that is a counterexample. 

Symbolic rewriting modulo SMT [13,14] allows rewriting symbolic states 
(t,b), where t is a term possibly containing variables and b a boolean term con- 
straining the allowed values of variables of t. The symbolic state (t, b) represents 
the set of (concrete) states that are instances of t such that the instantiating 
substitution satisfies b. Thus a rewrite to a symbolic state t’,b’) such that b’ 
is not satisfiable represents the empty set of concrete rewrites and satisfiability 
can be checked at each step to avoid useless work. This independent of checking 
that a goal is satisfied by a symbolic state. To implement symbolic rewriting in 
Maude, variables are replaced by symbols, treated as constants by Maude, and 
translated as variables when using an SMT solver to check satisfiability of the 
constraint. Symbolic rewriting allows us to reason about open systems, and to 
reason about all (possibly infinitely many) instances of a configuration. 

Verification problems are expressed as reachability problems expressed as 
statements for the form 


search(tg, bg) = (t’, b’) such that goalCond(t’, b’) 


where (t’, b’) is a pattern and goalCond is a boolean function that checks whether 
a state satisfies some condition. Typically, goalCond(t’, b’) also makes calls to the 
SMT solver to check whether some constraints derived from b’ are satisfiable. 

As illustrated by Fig. 1, Rewriting Modulo SMT implementations [11] tra- 
verse the search tree derived from the rewrite rules using BFS-based algorithms. 
At each step, e.g., (to, bo) — (t1, b1), the engine checks for the satisfiability of 
the condition b;. If the check fails, then search backtracks following BFS strat- 
egy. Otherwise, if the check succeeds, then the engine checks (1) whether (t1, b1) 
matches the pattern (t’,b’) and (2) if this is the case, it checks the condition 
goalCond(t1, b1), which may make further calls to the SMT solver, written as 
SMT(goalCond(t;, b;)). If goalCond returns true, then a solution for the reach- 
ability problem is found. Otherwise, the algorithm continues search following 
BFS. 

From the sequence of calls to the SMT solver, one can observe the following 
difficulties of exploiting incremental SMT solving when using BFS based search 
strategy: 
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Sequence of SMT-Calls using BFS 
SMT (bo) —— SMT(goalCond(to, bo)) 


a 


SMT(b1) —> SMT(goalCond(t1, b:)) 


SMT (bo) SMT(goalCond(ts, b2)) 


Fig. 1. Illustration of the search tree and SMT-calls when using Rewriting Modulo 
SMT following a BFS algorithm. The sequence of SMT-calls of a BFS algorithm is 
depicted to the left, where SMT (goalCond(t;, b;)) denotes possible SMT-calls required 
by the goal condition goalCond. The numbers inside the circles specify the order in 
which nodes are traversed. 


— Definitions of Increments: Given the generality of the accepted theory, 
it is not possible for the search engine to determine whether constraints, 
bı and beg, used in subsequent calls to the SMT, SMT(b,) and SMT(bg), are 
constructed using some increment, i.e., whether bz = by Ab1,2. This is because 
bı and bə are derived by applying different instances of rules which normally 
add/modify constraints in different ways. 

— Not possible to chain incremental calls: As it is not possible to define 
increments when using rewrites rules in general, it is not possible to effec- 
tively use incremental solving by chaining calls, such as in SMT (b1); SMT (b; A 
b1,2); SMT(b; A b12 A b23) sea 


To address this problem, we introduce a special class of rewrite theories, 
called Incremental Rewrite Theories. 


Definition 1. An incremental rewrite theory is a rewrite theory specification 
(X.E, R} where X is a typed alphabet; E is an equational theory; and R is a set 
of rewrite rules of the forms: 


(t, b) — (ti, b A bz) and (t, b) > (tı, b A bz) if cond 


where t,t; are well-formed terms; b, br are boolean formulas (in a given theory); 
and cond is a conjunction of equations. ? 


? The rule on the left is an unconditional rewrite rule that can be applied whenever 
it matches a subterm of the current state. The rule on the right is conditional. cond 
specifies conditions under which the rule can be applied. The condition is checked 
using using the equational theory to determine if the equations are satisfied by a 
candidate matching substitution. The term (t, b) represents a set of values, namely all 
instances for which the constraint b is true. A constraint solver is used to determine 
if b is satisfiable, that is, if the set of values is non-empty. In brief, the difference 
between b and cond is how they are used in reasoning. 
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The verification problem for incremental problems is a specialized reachabil- 
ity problem as defined below. 


Definition 2. Let T be an incremental rewrite theory. An incremental reacha- 
bility problem over T is of the form: 


search(to, bo) = (t’,b’) such that goalTerm(t') and SMT(b’ A^ br) 


where goalTerm is a function that takes a term and returns a boolean value and 
bz = goal(t’) is a boolean formula constructed from t. 


The following three examples illustrate how incremental theories can model 
different types of systems. These examples are based on specifications from the 
literature [2,13,16]. For ease of exposition, we simplify the rules in the descrip- 
tion below. In Sect. 4, the full specifications from the literature are used in our 
experiments. 


Example 1. This example is based on the work [2] for verification of the CASH 
scheduling algorithm [4]. In this algorithm, each task has a worst-case execution 
time. Whenever a task is completed before its deadline, the unused processing 
time is added to a global queue of unused budget, which can then be used by 
other tasks. Rewriting modulo SMT has been used to verify whether it is possible 
for a task to miss its deadline [2]. In particular, constraints keep track of the 
processing times and the available time budgets. 

It turns out that the specification of this algorithm as rewrite rules and 
the verification problem are an incremental rewrite theory and an incremental 
reachability problem, respectively. For example, the following rule specifies when 
a deadline is missed: 


(dı : global | deadlineMiss : missStat, Ats), 

(ido : server | state : st, usedBudget : t, timeDeadline : tı, maxBudget : n) rest, b) 
— ((id, : global | deadlineMiss : true, Ats), 

(ido : server | state : st, usedBudget : t, timeDeadline : t4, maxBudget : n) rest, 

b Abr) if (st = waiting V st = executing) 


where rest is the specification of the remaining tasks, Ats are other attributes of 
the server, by is the set of constraints t > 0 Atı >O0OAn>0A(n—t) > tı. This 
rule specifies that the deadline is missed if there is a task ido that is not finished, 
i.e., either waiting or executing, such that the time to finish (t1) cannot be met 
by the available time budget n — t required by the task. 

The verification problem of checking whether for some given configuration 
(to, bo) of server and tasks, a task can miss its deadline is specified by the fol- 
lowing search command which is an incremental reachability problem 


search(to, bo) => ((idi : global | deadlineMiss : true, Ats) rest, b’) such that SMT(b’) 


Example 2. Rewriting Modulo SMT has been used for verifying whether resource 
bounded intruders can slowly deny access to webservers [16]. This type of attack 
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was inspired by application layer DDoS attacks such as Slowloris [7] where the 
attacker attempts to exhaust all the resources of a webserver by periodically 
sending bursts of multiple requests. When receiving such bursts of requests, the 
webserver has to allocate resources for at least some period of time, called time- 
out. As the webserver has limited resources, the attacker is capable of denying 
service to legitimate users by sending enough bursts. 

Constraints were used in previous work [16] to keep track of (1) the number 
of resources available by the webservers, and (2) the timeout period of bursts. 
While we refer to the previous work [16] for the complete formalization, we 
illustrate the incrementality of such specifications with a simplified version of 
the protocol initialization rule from reference [16]. 


({iid | pxs | ri | Trec] [sid | pxs’ | rs], b) —> 
({iid | px(num, rp) pxs | ri” | Trec] [sid | px(num, rp) pxs’ | rs”], b A br) 


This rule specifies that the intruder iid with ri resources creates a new burst of 
protocol session instances px(num, rp) with num instances each using rp resources, 
where num is a symbol. These instance requests are received by the server sid 
which has rs resources. The resources of the intruder, ri, and the resources of 
the server rs are updated to the fresh symbols ri” and rs”. These symbols are 
constrained by the boolean increment by defined as ri” = (ri— num x rp) Ars’ = 
(rs — num x rp) Anum > 0A ri” > 0. Similar rules specify when the protocol 
sessions timeout and are cleaned up by the server thus releasing resources. 

The verification property is to check whether a bounded intruder with some 
limited number of resources ri can deny service by consuming the server sid’s 
resources. This can be expressed by an incremental reachability property as 
follows where (tg, bo) specifies the initial condition when all intruder and server 
resources are free: 


search(to, bo) = (fiid | pxs | ri | Trec] [sid | pxs’ | rs], b’) such that SMT(b’ A bz) 


where by is the constraint rs < 0 specifying that the resources of the server sid 
are depleted. 


Example 3. This example of verification of cyber-physical systems (CPSes) is 
based on reference [13]. A CPS is represented by a set of agents (ag,,...,ag,,) 
that interact with the environment (env) to achieve some goal while not violating 
properties, such as the minimum distance to other objects. 

Constraints are used to specify agent’s physical attributes, such as its posi- 
tion, at(ag,(x,y)), speed, spd(ag,v), acceleration, acc(ag, acc), and direction 
dir(ag, dir) of an agent ag. The evolution of a system with one agent can be 
specified by the following incremental rule when assuming, for simplicity, that 
the agent’s direction is on the x-axis. 


([env | at(ag, (x, y)), spd(ag, v), acc(ag, acc), dir(ag, dir), kb] conf, b) > 
(fenv | at(ag, (1, y1)), spd(ag, vı), acc(ag, acc), dir(ag, dir), kb] conf, b A br) 
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Here kb is the set of other knowledge-base elements, conf contains the agent’s 
internal representation, £1, Y1, V1 are fresh symbols and by is set of constraints: 
zı = (x + (v +v) x dt/2) Ayı = y ^v = v+acc x dt. These constraints specify 
the agent’s new position and speed using classical physics equations. 

The verification property bad where an agent is too close to an obstacle, such 
as a pedestrian, is specified by the search command: 


search(to, bo) > 
(fenv | at(ag,, (x1, y1)), at(age, (£2, y2)), kb] conf, b’) such that SMT(b’ A br) 


where by is the set of constraints: 7; = x2 A yı = y2, specifying that two agents 
ag, and ag, are in the same position, i.e., colliding. 


3 Hybrid BFS-DFS Algorithm 


The definition of Incremental Rewrite Theories addresses the problem of the 
Definition of Increments discussed above. The second problem (Not pos- 
sible to chain incremental calls) still needs to be addressed. Indeed, BFS 
procedures do not enable the chaining of incremental calls. To illustrate this, 
consider again the search tree and BFS execution in Fig.1. Assume that 
bi = bo A boi, b2 = bo A bo 2 and that goalCond(t,b) has the form b A by as 
one would expect when using incremental rewrite theories. It is possible to 
call the SMT solver incrementally during the sequence of calls SMT(b;) and 
SMT(goalCond(t;,b;)), but not chain incrementally the call SMT(b2). This is 
because it is not possible to define an increment between bı and bə as they lie 
in different branches of the search tree. 

The first obvious alternative is using Depth-First Search (DFS) instead of 
BFS. This would indeed lead to an execution that could chain incremental calls 
to the SMT solver. For example, in the tree depicted in Fig. 1, the sequence of 
calls would be 


SMT(bo); SMT (goalCond(to, bo)); SMT (bi); SMT (goalCond(t1, bi )); 
SMT(b3); SMT (goalCond(t3, b3))... 


Since bg is of the form bo A bo,1 A bi,3, we know the increment is b;,3. There are, 
however, two problems with DFS. The first problem is that DFS may not find 
a solution that could be found using BFS due to an infinite branch. The second 
problem is that the sequence of call using goalCond(t, b) appears in between the 
increments, e.g., SMT(bo); SMT (goalCond(to, bo)); SMT(b1). 

We propose the algorithm hybrid_search described in Fig. 2 that addresses 
these two problems of DFS by combining BFS and DFS and using the PUSH 
and POP features of SMT solvers for incremental solving. These features enable 
the creation of backtracking scopes of learned clauses. By default, sequential 
calls to SMT will attempt to use incremental solving based on the constraints 
solved in previous calls. A call to PUSH will add to the solver stack any learned 
clauses from calls to SMT while a call to POP will remove any learned clauses 
since the last PUSH. 
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: Queue : FIFO Queue 
: Solver : SMT Solver 


found : Node 


: function hybrid_search(tree, depth, goal) 


found + NULL 

push root of tree on Queue 

while Queue has elements and found is NULL do 
node + Queue.pop() 
dfs_bounded(node, depth, goal, 0) 

end while 


: end function 
: function dfs_bounded(node, max_depth, goal, curr_depth) 


b + node. getBoolean() 
Solver .push() 
rsat < Solver.check(b) 
if rsat is UNSAT then 
Solver.pop() 
return 
end if 
if goal(node) then 
found < node 
return 
end if 
if curr_depth = maz_depth then 
for all child € node.children() do 
Queue.add(child) 
end for 
return 
end if 
for all child € node.children() do 
dfs_bounded(child, maz_depth, goal, curr_depth+1) 
Solver .pop() 
end for 
end function 


Fig. 2. Pseudo-code of the Hybrid Search Algorithm hybrid_search. 
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The hybrid_search algorithm takes as input the search tree T°, a non-negative 
natural number d, and a goal condition g. Intuitively, the parameter d specifies 
the depth to which the algorithm shall perform DFS before switching to BFS. 

We start with Queue empty and a Solver. hybrid_search starts at line 4 with 
the next few lines initializing found to be NULL and pushing the root of T onto 
Queue. The while loop starts with line 7 continuing while Queue is non empty 
and no solution has been found. It pops the next node off the Queue on line 
8, then calls dfs-bounded on the next line using this node as the root starting 
on line 12. dfs_bounded is a modified depth-bounded depth-first search. It starts 


3 Notice that in practice, there is a mechanism that constructs the tree on the fly. 
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Sequence of SMT-Calls using hybrid_search DrD 
with depth 2 and goal g. D 


PUSH = SMT(bo) = PUSH SMT(b1) = PUSH EDE 


ey a a 
SMT (ba A g(ta)) [PoP ]<— POP<t— SMT(b3 ^ g(ts)) eo 
bı = bo A bo, be = b2 A b2,6 
b2 = bo A bo,2 b7 = b3 A b3,7 
ba = bi Abı, bg = ba A ba, 


b4 = bı A bı, bo = bs Abs,9 
bs = b2 A b2, bio = be A be,10 


POP —> PUSH — SMT(b2) — PUSH 


eee PUSH “— SMT(bs) #—POP— SMT(b2 A g(t2)) 


Fig. 3. Illustration of an hybrid_search algorithm execution using the goal condition g 
and depth two. The POP surrounded by a box indicates the points when the algorithm 
back-tracks in the search tree. The numbers inside the circles specify the order in which 
nodes are traversed. 


with creating a backtracking scope on Solver by calling PUSH and storing the 
result SMT(b) where b is the boolean constraint of the current node. 

Subsequently, in line 16, it checks if SMT (b) returned UNSAT, and if so, we 
POP and return immediately and not explore any children of this node. Any 
descendent nodes would have a boolean constraint of the form b A^ br for some 
by, and since SMT(b) is UNSAT it must be the case that b A by is also UNSAT. 
Otherwise, we continue with checking if goal(node) is true on line 20 and if so 
setting found to this node and then terminating dfs_bounded and hybrid_search. 
If found is not set, then line 24 checks when the current depth is equal to the 
depth parameter d and if it is we add all of the children nodes, i.e., all the nodes 
that are d+ 1 depth away from the initial root node called from line 9, to Queue 
and no more nodes at a lower depth are visited for now. After all such nodes 
are added, the execution returns to line 7 to start another dfs_bounded from the 
next element in Queue. Until then, it continues traversing the tree in a DFS-like 
manner on line 30 ensuring that when dfs_bounded backtracks, we call POP for 
each node, and hence it backtracks such that Solver can properly unlearn clauses 
that it no longer needs. 

We illustrate the execution of hybrid_search with the tree shown in Fig. 3. It 
also contains the sequence of calls to PUSH,POP and SMT due to the initial 
call to dfs_bounded. The sequence of calls illustrates the chaining of incremental 
calls to the SMT solver. For example, the data-structures constructed in the call 
SMT(b;) are used in the SMT calls for b3, b4, including the calls goal(b3) and 
goal(b4). This makes sense as bı is sub-formula of bs, b4, goal(b3) and goal(b,). 
However, the data-structures constructed in the SMT call for goal(b,) are not 
stored due to the subsequent POP call, as goal(b) is not necessarily a subformula 
of b3, b4, goal(bs) and goal(b4). The second observation is the combination of 
DFS and BFS. While the subtree of depth d = 2 is traversed, the algorithm 
removes the data-structures constructed during the call of SMT(b,), indicated 
by the 2 x POP in Fig. 3, as bı is not necessarily a subformula of bə. 
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Notice that the depth parameter (d) plays the role of specifying how much 
incremental solving one is willing to use with the risk of traversing longer a 
branch of the search tree that may not have a solution. For example, in the tree 
and execution shown in Fig. 3, the algorithm will traverse the node (t7,b7) and 
will call SMT(b7), but without using the data-structures constructed previously 
for bg, that is, it will not solve it incrementally. 

The following results relate hybrid_search with BFS and with DFS. 


Proposition 1. Let T be a tree and g be a decidable goal condition. Then, 
hybrid_search(T, 0, g) will traverse T in the same order as BFS. 


Proof Sketch. A DFS search bounded by depth 0 will only traverse a single 
node, the node it starts at. Then, it adds nodes to a FIFO queue in the same 
manner as BFS. Hence, hybrid_search(T,0,g) will traverse T in the same order 
as BFS. QED. 


Proposition 2. Let T be a tree and g be a decidable goal condition. Suppose 
the depth of T is d. Then, for any k > d, hybrid_search(T,k,g) will traverse T 
in the same order as DFS. 


Proof Sketch. If k is greater than or equal to the depth of T, then a 
k depth-bounded DFS from the root node would traverse all of T. Hence, 
hybrid_search(T, k, g) traverses the T in the same order as DFS. QED. 

The following statement provides coverage guarantees. 


Proposition 3. Let d>0, T be a tree of finite branching, and g be a decidable 
goal condition. Then, hybrid-search(T, d, g) finds a solution in finite time, i.e., 
some node n in T such that g(n) is true, if such a solution exists. 


Proof. Let B; be the number of nodes in T at depth i. Suppose that the solution 
node n exists at depth r and no solutions exist at a lower depth. Let 0 < r < qd 
for some q. The first depth-bounded DFS will traverse all nodes up to depth 
d. This then adds Bai; nodes to Queue. Running the depth-bounded DFS run 
these nodes will traverse all the nodes to 2d. Traversing all nodes up to gd would 
take 1+ Ba+1 + Ba+2 +... + Bga iterations of depth-bounded depth first searches. 
Since n exists at depth r < qd and each B; is finite since T has finite branching, 
n would be found in finite time. QED. 

To address the fact that search trees may have infinite depth, often one 
uses bounded search that searches the tree until only some given depth d. The 
following proposition states that in these cases it is best to deploy hybrid_search 
with depth d to search through all nodes of the sub-tree, provided incremental 
SMT calls are more efficient than SMT calls from scratch. 


Proposition 4. Let T be a tree of finite branching with branching factor b and 
g be a decidable goal condition. Let T(d) be the sub-tree of T of depth d with 
d > 0. Assume that incremental SMT calls, i.e., using PUSH, take less time 
than calls from scratch, i.e., without using PUSH. Then for any d’ > 0 such that 
d' # d, the time required by hybrid_search(T, d, g) to traverse all nodes in T(d) is 
less than the time of hybrid_search(T, d’,g) to traverse all nodes in T(d). 
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Proof. Let 0 < r < 1 be the average performance benefit from incremental 
SMT calls and t be the time it takes for non-incremental SMT calls. Let B; be 
the number of nodes at depth i. Since b is finite, each B; is finite. The time 
required by hybrid_search(T,d,g) to traverse all nodes in T(d) is t + rtBy + 
rtB2 + ... + rtBa. Suppose that 0 < d’ < d. Let pd’ < d < (p+1)d’' for some 
p. For hybrid_search(T, d’,g) to traverse all nodes in T(d), it must traverse all 
nodes in T((p + 1)d’) because each dfs_bounded must travel exactly d’ depth, 
hybrid_search(T, d’, g) will traverse only depths that are multiples of d’. Then, the 
time required for hybrid_search(T, d’, g) ist+rtB,+...trtBa +tBoqitrtBu yet 
TAB og +... + tBpa + rtBpa +1 +... + rtBip+a. There are p + 1 terms that do 
not get the benefit from incremental SMT calls for hybrid-search(T, d’, g) while 
there is 1 term that does not get this benefit for hybrid-search(T, d, g). Hence, the 
time required for hybrid-search(T, d, g) to traverse all nodes in T(d) is less than 
the time required for hybrid_search(T, d’, g) to traverse all nodes in T(d). Now, 
suppose that d’ > d. Then, for hybrid_search(T, d’, g) to traverse all nodes in T(d), 
it must traverse all nodes in T(d’). The time required for hybrid_search(T, d’, g) 
ist+rtB, + rtB2 + ... + rtBg. But, because d’ > d and each rtB; > 0 the 
time required for hybrid_search(T,, d, g) is less than hybrid_search(T,, d’, g). Hence, 
the time required for hybrid_search(T, d, g) to traverse all nodes in T(d) is less 
than the time required for hybrid_search(T, d’, g) to traverse all nodes in T(d). 
Therefore, for any d’ # d the the time required for hybrid_search(T,d,g) to 
traverse all nodes in T(d) is less than the time required for hybrid_search(T, d’, g) 
to traverse all nodes in T(d). QED. 


4 Implementation and Experiments 


Our implementation is based on Python with the Z3 SMT solver and Maude 
integrated using Python bindings [15] as depicted in Fig.4. The Z3 Solver is 
responsible for checking the incremental satisfiability of constraints using SMT, 
PUSH and POP, while Maude is responsible for executing rewriting rules. The 
Maude bindings allow for loading Maude files into the Python implementation of 
hybrid_search. The search is done with a Python function that repeatedly calls the 
Maude search with one step (Search1) so that the traversal of the search space 
can be controlled. The original Maude specifications were modified to replace 
calls to SMT with calls to functions defined using the Maude hook mechanism 
for attaching external code to function symbols. This mechanism is exposed by 
the Maude Python bindings. There are two types of function, one that checks 
satisfiability while keeping any learned clauses from the check, and one that just 
checks without adding any learned clauses. The functions keep track of the SMT 
solver state using appropriate calls to PUSH and POP. The implementation is 
available at [17]. 

Figures 5, 6 and 7 summarize the experiments carried out using implemen- 
tations available in the literature [3,13,16] for the verification of the systems 
described in Examples 1, 2, and 3. All experiments were run on a Windows 10 
machine, Intel Core i7-10700J, 16 GB of RAM, on Python 3.10.2, using Maude 
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SMT, POP, PUSH Search1 


Z3 Maude 


» hybrid_search ‘ 
SAT, UNSAT (t, b) 


Fig. 4. Overview of the implementation used for the experiments using hybrid-_search, 
the SMT solver Z3 and the rewriting tool Maude. 


Python bindings 1.1.2 and Z3 4.11.2.0. We measure the runtime for these three 
applications of rewriting modulo SMT to determine the performance gain from 
using hybrid search at various depth parameters compared to BFS and DFS. 
Each table shows the initial configuration for the system, then statistics for 
searches for BFS, DFS, and using hybrid_search at various depths terminating 
when finding a single goal node. The statistics have the form n/m/p which 
specify the time n in seconds to perform verification, the number of states m 
traversed, and the percentage p of verification time required by SMT-solving. 
DNF indicates that no solution was found within 30 min. For example, the first 
row for cashOK, using the BFS mechanism for instance, the execution time was 
6.9 seconds, requiring 91 state traversals while spending 77% of execution time 
in Z3. 

For our experiments, we used the same subsets of the verification problems 
used in references [3, 13, 16]: 


— cashOK (Jo, fi, I2, I3,b) and cashBad(Jo, I1, I2, I3,b) correspond to symbolic 
initial configurations of a CASH scheduling problem with two servers (see 
Example 1). Jp and I; specify, respectively, the maximum budget and the 
period of the first server, while Ip and [3 specify, respectively, the maximum 
budget and period of the second server. b is a constraint on the values of 
Jy, [g,I3, and I4. cashOK uses a correct implementation of the scheduler, 
while cashBad uses an incorrect specification. 

— Slowloris( P, , P>, DoSDur) corresponds to symbolic initial configurations of a 
Slowloris verification problem (see Example 2). P, specifies the bound on the 
number of parallel bursts of symbolic protocols, and P> specifies the bound 
on the number of different types of messages sent in parallel, where P = 0 
denotes no bound. Moreover, DoSDur specifies the minimum duration for 
which the server’s resources are depleted in order to consider the DoS attack 
successful. 

— pedestrian(t, Safer, Safe, Unsafe) specifies a pedestrian crossing scenario prob- 
lem where an autonomous vehicle is approaching a pedestrian crossing. The 
verification problem is to avoid an unsafe situation. The three levels of safety 
are defined according to the parameters Safer > Safe > Unsafe specifying 
bounds on the distance to between the vehicle and the pedestrian measured 
in terms of time to travel. The verification problem is to determine whether a 
given vehicle controller cannot reach an unsafe situation within ¢ time units 
when starting at a safe situation. The size of a time unit is 0.1s. 


The results for the CASH verification experiments show that hybrid_search 
finishes up to about 10 times faster than BFS and terminates in all cases as 
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Init BFS DFS HYBRID d=2 HYBRID d=4 | HYBRID d=8 
cashOK, | 6.9 / 93 / 77% | 0.7/9 /8% | 0.2 / 37 / 21% | 13/117/12%| 0.7/9 / 8% 
cashOK, |8.0 / 100 / 71% | 3.0 / 12 / 4% | 0.6 / 52 / 13% | 19/118 / 9.0%] 0.9 / 14 / 6% 
cashOKs | 3.5 / 65 / 84% DNF 0.1 / 32 / 25% | 0.06 / 9 / 26%) 1.3 / 28 / 6% 
cashBadı | 5.9 / 63 / 74% | 0.7/9/7% | 0.2 /27/21% | 12/61/10.%| 0.7/9/ 8% 
cashBad, | 7.5 / 70 / 69% | 2.9 / 12 / 4% | 0.6 / 42 / 13% | 1.8 /62/7.7%| 0.9/ 14 / 6% 
cashBads | 2.6 / 39 / 81% DNF 0.1 / 22 / 26% | 0.06 /9/ 24% | 1.4 / 28 / 6% 


Fig.5. CASH Verification Experiments. cashOKi 
cashOK2 = cashOK(J0, /1, 12, 13, J0+/3 > 1+ 12), and caseOK3 = caseOK(J0, I1, 12, 
I1, I0 + 12> 11), and mutatis mutandis for cashBad,, cashBad2 and cashBad3. 


cashOK(J0, 


I1, 12, 13, true), 


Init BFS DFS HYBRID d=2 HYBRID d=3 HYBRID d=4 
Slow | 2.4 / 51 / 88% | 0.2/66/39% | 0.3 /52/ 56% 0.4/ 79 / 57% | 0.2 / 35 / 41% 
Slows [39.8 / 775 / 83% DNF 8.0 / 1612 / 44% | 3.1 / 703 / 39%| 13.0 / 3314 / 36% 
Slows | 0.5 / 11 / 87% | 0.06 / 10 / 50% | 0.06/9/ 46% | 0.05 /8/52%| 0.06 /9/ 50% 
Slows | 1.8 / 29 / 86% | 0.1 /27/ 39% | 0.2 / 27 / 55% 0.2 / 34 / 54% | 0.1 / 20 / 41% 
Slows |19.0 / 147 / 78% DNF 2.5 / 187 / 44% | 13/118 / 41% | 3.9 / 261 / 39% 


Fig. 6. Slowloris Experiments. Slow: = Slowloris(1,0, 24), Slow2 = Slowloris(1, 0, 36), 
Slow3 = Slowloris(1, 1, 12), Slows = Slowloris(1, 1, 24), Slows = Slowloris(1, 1, 36). 
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12.3 / 119 / 62% 


4.8 / 57 / 68% 


7.0 / 117 / 65% 
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53.4 / 323 / 71% 


23.0 / 152 / 78% 


15.6 / 213 / 69% 


28.6 / 232 / 77% 
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301.0 / 819 / 84% 


97.3 / 387 / 85% 


63.9 / 429 / 80% 


52.7 / 364 / 82% 
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cps4 


12.5 / 119 / 63% 


4.8 / 57 / 70% 


7.8 / 118 / 69% 


6.4 / 99 / 66% 


4.7 / 57 / 68% 


cps, 


56.0 / 323 / 72% 


25.4 / 152 / 80% 


18.4 / 227 / 71% 


19.8 / 192 / 74% 


23.3 / 152 / 79% 


CPS, 


285.1 / 819 / 83% 


100.9 / 387 / 85% 


60.2 / 424 / 79% 


84.6 / 437 / 85% 


101.4 / 387 / 85% 


Fig. 7. Cyber-Physical System Verification Experiments, where cps, = pedestrian(3, 3, 
2,1), cps, = pedestrian(4, 3, 2,1), cps; = pedestrian(5,3,2,1), cps, = pedestrian(3, 4, 
2,1), cps; = pedestrian(4, 4, 2,1), cps, = pedestrian(5, 4,2,1). The bound t, 2 x t and 
3 x t is determined according to the t parameter of the scenario. 


opposed to two of the DFS cases where it does not finish within 30min. The 
overhead of Z3 is reduced from about 70% to 80% down to 6% to 25% from BFS 
to hybrid_search. This indicates the effectiveness of the incremental SMT solving 
for the types of constraints used in this example. 

Similarly, in the Slowloris examples, hybrid_search finishes up to 10 times 
faster than BFS with termination while two of the DFS cases do not finish 
within 30 min. In these cases the overhead of Z3 goes from about 80% to 90% in 
BFS while it goes from about 30% to 60% in hybrid_search, demonstrating the 
effectiveness of the incremental solving. Interestingly, even when there is a much 
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larger number of states traversed, e.g., in case Slowo and HYBRID d = 4 with 
3314 states traversed as opposed to 775 states traversed by BFS, the verification 
time is one third, from about 40s to 13s. This indicates that the main overhead 
of BFS is indeed SMT solving. 

For the Cyber-Physical System (CPS) Verification experiments, hybrid-search 
completes up to about 5 times faster than BFS. The overhead of Z3 does not 
change significantly in these experiments, which indicates that the incremental 
solving is not as effective as in the other two examples (CASH and Slowloris). 
The reason for this may be the non-linear nature of the constraints for CPS 
systems which contrast with the former two examples that use linear arithmetic 
constraints. Despite this, hybrid-search and DFS still outperform BFS because 
they need to traverse fewer nodes before finding a goal node. 


5 Related Work 


We consider three related areas of work in optimizing symbolic execution mod- 
ulo SMT, hybrid search strategies, incremental constraint solving methods, and 
tradeoffs between search space and constraint complexity. 


Hybrid Search Strategies. There have been others that have previously explored 
techniques of combining BFS and DFS so to take advantage of both of their 
benefits while reducing the drawbacks of each. 

Reference [5] proposes a hybrid algorithm for Binary Decision Diagrams 
(BDDs). BDDs are are often used to represent and manipulate boolean functions 
symbolically. Traditionally, depth-first approaches were used in the construction 
of BDDs as it had relatively low memory overhead. Though, it had been dis- 
covered that using a breadth-first approach instead had better performance due 
to better memory access locality at the cost of larger memory overhead. To 
improve upon both approaches a hybrid of the two is used. Essentially, the algo- 
rithm switches between the two techniques based on its memory overhead. When 
the memory overhead is computed to be low, a breadth-first search is used and 
when it is high a depth-first search is used. 

Reference [1] constructs a “breadth-first, depth-next” algorithm for building 
Random Forest (RF) models. An RF model is a machine learning model that 
uses decision trees. Both DFS and BFS approaches are used in machine learn- 
ing frameworks. They observe that BFS has memory efficient access patterns 
at lower depths. As the depth increases it loses this benefit and virtually has 
random access to memory. At this point, DFS performs better. As a result, their 
algorithm starts with a breadth-first approach until it is computed that is no 
longer has efficient access pattern, switching to a depth-first approach. 

Reference [9] introduces “depth-first iterative-deepening (DFID).” One of 
the issues with BFS is that it has exponential memory complexity. DFS can 
circumvent this drawback as its memory complexity is linear, but comes with its 
own problems. It generally requires some depth bound and check for repeated 
nodes, otherwise the search may not terminate. The actual depth bound needed 
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may not be knowable at runtime and choosing a bound too low may result in 
the search ending without finding the solution. To counteract the downsides of 
BFS and DFS, DFID is used. DFID starts with DFS bounded by depth one, 
then performs a DFS bounded by depth two, and continue this process with 
incrementally larger bounded depths until a solution is found. It must visit the 
same nodes multiple times, but it is shown that the runtime complexity is not 
effected by it. 

Unfortunately, none of these algorithms seem particularly helpful with 
respect to rewriting modulo SMT. For example, prior algorithms [1,5] attempt 
to take advantage of memory locality as much as possible. In our case, it would 
not give us much performance increase. Reference [9] requires nodes to be vis- 
ited multiple times. This would lead to duplicate calls the SMT solver, only 
increasing the bottleneck. 


Incremental Solving. In reference [10] the authors compare cache-based and 
stack-based incremental constraint solving methods in the context of symbolic 
execution for test generation. Cached-based incrementality works outside the 
solver to cache results and attempt to reuse them. Stack-based incrementality 
uses a solvers ability to reuse information learned when solving a subproblem 
and the associated push/pop interface. Implementations of the two methods 
and a baseline (no incrementality) were compare on large benchmark set of C 
programs and on randomly generated programs. The space of symbolic execu- 
tion paths was searched using bounded depth first search. The authors found 
that caching generally increased average solving time over baseline (by a factor 
of 2-5 depending on code size), while stack-based methods decreased average 
solving time by roughly a factor of 20. This is consistent with our observations 
even though the source of search tree is different and the class of constraints is 
different. 


Trading Search Space for Constraint Complexity. A notion of guarded term 
is introduced in reference [2] as a method to reduce the search state space in 
symbolic rewriting modulo SMT by replacing non-determinism by disjunction. 
The effect of using guarded terms is demonstrated in a study of the CASH algo- 
rithm for task scheduling. Many properties that could not be checked using sym- 
bolic execution modulo SMT (due to size of search space and timeout) became 
tractable using guarded terms. 

A study of the tradeoff between search space size and constraint size 
using symbolic execution modulo SMT in the context of analyzing safety of 
autonomous systems such as platooning scenarios is presented in reference [13]. 
The results in that paper suggest that not only the size of state space matters 
for automation, but also the size of constraints that are sent to the SMT solver 
as many searches fail to terminate due to non-termination of constraint solving 
when constraints get large, while the same searches terminate with disjunctions 
are turned into branching in the search space. 

None of these approaches, however, investigate the use of incremental SMT 
solving for improving performance of Rewriting Modulo SMT. 
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6 Conclusions and Future Work 


This paper proposes Incremental Rewrite Theories that enable incremental SMT 
solving for rewriting modulo SMT. This is accomplished by the search procedure 
hybrid_search which combines BFS and DFS. The effectiveness of hybrid-search 
is demonstrated by using a collection of verification problems taken from the 
literature, including algorithm verification, network security analysis, and cyber- 
physical systems safety verification. In all examples, the time taken to verify by 
hybrid_search improved by a factor between 5-10 when compared to traditional 
BFS approaches, showing the great benefits of using incremental solving. 

The current notion of incremental rewrite theory is essentially a syntactic 
notion although equational theories are used to reduce terms and matching may 
be modulo axioms, such as associativity and commutativity. This makes identi- 
fing the boolean increment efficient and thus well suited for the hybrid algorithm. 
An interesting direction for future work is to investigate less restrictive notions 
of incremental and indentify more general classes of rewrite theories where incre- 
mental solving is effective. Another direction of future work we are investigating 
is the trade-offs of incremental solving and the shape of constraints, e.g., use dis- 
junctions to reduce search space versus split disjunctions to reduce SMT solving 
time. We also are investigating the incorporation of incremental solving algo- 
rithms in tool implementations such as Maude. 
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Abstract. The need to verify symbolic computation arises in diverse 
application areas. In this paper, based on earlier work on verifying com- 
putation of definite integrals in HolPy, we present a tool Iscalc for per- 
forming a variety of symbolic computations interactively, taking a middle 
ground in terms of easy of use and rigor between computer algebra sys- 
tems and interactive theorem provers. The tool supports user-level defi- 
nitions and dependency among computations, allowing construction and 
reuse of custom theories. Side conditions are checked on a best-effort 
basis. The tool is applied to highly non-trivial computations from the 
textbook Inside Interesting Integrals. 


Keywords: Symbolic computation - User interface - Computer algebra 


1 Introduction 


Symbolic computations arise in many mathematical proofs as well as in sci- 
ence and engineering. The use of computers to ensure their correctness is hence 
an important problem. Interactive theorem provers and computer algebra sys- 
tems provide two alternative approaches. Most interactive theorem provers have 
extensive libraries in analysis [6], based upon which one can verify correctness of 
computations with a very high level of confidence. However, the learning curve for 
using such libraries is quite steep. On the other hand, computer algebra systems, 
such as Mathematica, Maple, etc, aim to perform computations automatically. 
However, it is difficult to guide the computation if the automatic procedure fails, 
and the correctness is not fully guaranteed. Indeed there have been examples of 
mistakes made by such computer algebra systems in the past [11]. 

Previous work [18] introduces a system for performing and verifying sym- 
bolic computation as an extension to the HolPy interactive theorem prover [19]. 
The user can perform calculation of definite integrals step-by-step, using rules 


© The Author(s) 2023 
B. Pientka and C. Tinelli (Eds.): CADE 2023, LNAI 14132, pp. 577-589, 2023. 
https://doi.org/10.1007/978-3-031-38499-8_33 


578 B. Zhan et al. 


such as substitution, integration by parts, etc. Each step has a relatively simple 
implementation, and proofs in higher-order logic can be constructed automat- 
ically from the sequence of steps, which in turn can be checked by the HolPy 
kernel. This provides a user experience which can be seen as a mix between the 
two approaches discussed above, combining the more intuitive feel of computer 
algebra systems with higher level of confidence in the results. 

In this paper, we present a significant extension to the work in [18], forming 
an independent tool named Iscalc (Interactive symbolic calculations). In partic- 
ular, we make the following extensions aimed at greater safety, extensibility, and 
ability to handle a wider range of examples. 


1. We introduce user-level definitions and dependency among computations, 
allowing construction and reuse of custom theories. This is achieved by main- 
taining contexts, which contain the list of existing definitions and identities, 
as well as assumptions in the current computation. 

2. We introduce systematic checks on wellformedness of expressions and side- 
conditions for applying certain rules within Iscalc (rather than only when 
reconstructing proofs). This increases confidence in the computation without 
proof reconstruction. 

3. In addition to definite integrals, the tool now supports computation with 
limits, series, and indefinite integrals. We also support improper integrals, 
and many more techniques of computation, such as series expansions and 
differentiating under the integral sign. 

4. With only few exceptions (such as partial fraction decomposition), all func- 
tionalities are now implemented independently rather than depending on 
SymPy. We found this approach, aimed at avoiding problems caused by lim- 
itations of SymPy, to be more flexible and extensible in the end. 


One of our main aims and yardstick for measuring progress is verifying com- 
putations from the textbook Inside Interesting Integrals [17]. This book contains 
many computations of integrals using a variety of techniques, including differen- 
tiating under the integral sign, series expansions, and so on. Many computations 
are quite involved (the longest example we did, Ahmed’s Integral, is 4 pages long 
in the book). We also carry over and complete some of the case studies in [18]. 

Our aim is to provide a user interface that is more intuitive and accessible 
to mathematicians and engineers. In particular, computations are displayed in 
TeX form, and whenever there is tension between conventional mathematical 
language and the more precise formal language, we prefer the former. We take 
the best-effort approach to correctness, providing systematic checks for the usual 
mistakes, such as cancelling expressions that may be zero, or exchange of sums 
that are not absolutely convergent. However, full correctness guarantees in the 
sense of interactive theorem proving is not achieved without proof reconstruction, 
which we leave to future work. In this respect, our approach is more similar to 
SMT solvers and program verification tools based on them, which sacrifice some 
correctness guarantees for more efficiency and speed of development. 

We now give an outline for the rest of this paper!. Section 2 describes the 
overall architecture of Iscalc. Section 3 shows results of case studies, and gives 


1 Source code and examples are available at https://github.com/bzhan/iscalc. 


Iscalc: An Interactive Symbolic Computation Framework 579 


some interesting examples. Section 4 discusses some lessons we took from this 
work, especially for user interface design. Section 4.1 discusses related work and 
Sect. 5 concludes the paper. 


2 Architecture 


Iscalc has a layered architecture consisting of several modules, as shown in Fig. 1. 
In this section, we begin with some preliminary definitions, then describe the 
functionality of each module in turn. 


Proof 
methods 


Context 


| Definitions | 
— [N 
| Inequalities | 

| Induct hyp. | 
| Premises _| 


Algorithms 


Fig. 1. Overall Architecture 


2.1 Preliminaries 


The term language of Iscalc inherits from that in [18], but with extensions for 
limits, summation, and indefinite integrals. The full syntax is as follows. 


e := v |c] e1 op e2 | f(e) | Deriv(e, v) | Integral(e, v, a, b) | 
Limit(e, v, a, dir) | Sum(e, i, a, b) | Indefinitelntegral(e, v, deps) | Skolem(n, deps) 


Constructors on the first line stand for variables, constants, operators, func- 
tion applications, derivatives, and definite integrals, respectively. Constants are 
extended to include positive and negative infinities. Constructors on the second 
line are new, and we explain them in more detail. 

Limit(e, v,a, dir) represents the limit of expression e as variable v goes to 
expression a, here dir represents the direction of the limit. That is, we distinguish 
between lim, .94 f(x) and lim, _.9_ f(x), etc. Sum(e, i, a,b) represents summa- 
tion of expression e as the integer index i goes from a to b (inclusive, except 
when b = oo). Indefinitelntegral(e, v, deps) and Skolem(n, dep) are used together 
for computing with indefinite integrals. The former represents indefinite integral 
of e with respect to v. When this is evaluated to an expression plus “C”, this 
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C is represented by a Skolem term. Here deps represent the additional variables 
that C may depend on, which comes from the list of dependent variables deps 
of the indefinite integral. The use of dependent variables in evaluating indefinite 
integrals is illustrated by an example in Sect. 3.1. 

Another extension compared to [18] is the addition of formulas. These are 
used to specify goals, wellformedness conditions on terms, as well as assumptions 
on goals and definitions. Currently we support the following constructors for 
formulas:? 


f := e1 op e2 | islnt(e) | notInt(e) | converges(e) 


where the binary operator op is one of =,4,<,<,>,>. islnt(e) and notlnt(e) 
represent e is/is not an integer. converges(e) represents e is convergent, where e 
is a series whose upper limit is oo. 


2.2 Context 


In [18], each computation is independent from each other, and all available def- 
initions and identities are built into the kernel. In contrast, Iscalc develops a 
system of user-level definitions and dependency between computations similar 
to usual interactive theorem provers. This is achieved by a hierarchy of books, 
files, definitions and goals. Each book consists of an ordered list of axioms, def- 
initions, and files, and may depend on other books. Each file contains a list of 
goals, whose computation may depend on previous items in the book. Each defi- 
nition specifies a new function along with assumptions on the arguments of that 
function. Each axiom or goal specifies a single expression to be proved under a 
set of premises. It may be marked with attributes to specify its type or how it 
is to be used (e.g. whether it can be used during simplification). 

In the implementation, a Context object maintains the list of definitions, 
identities, and inequality rules available at the current file. It also contains the 
premises and inductive hypothesis for the current computation (these are mod- 
ified when performing a case analysis or induction, as described in Sect. 2.5). 


2.3 Algorithms 


Iscalc implements several basic algorithms in computer algebra, for checking 
inequalities, simplification and normalization of expressions, computing limits, 
and solving equations. All of these take a Context object as input, and depend 
on the context information. 


? Currently we do not use logical operators, as negation is unnecessary for the current 
list of formulas, and conjunction and disjunction are represented using internal data 
structures. This may change as new needs arise in the future. 
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Inequality Checking. Unlike in the previous paper, condition checking is imple- 
mented entirely from scratch rather than relying on SymPy. It is well-known that 
checking inequalities involving transcendental functions is undecidable. Our goal 
is to perform simple rule-based reasoning automatically, leaving more involved 
inequalities to be proved with user guidance. The overall approach is saturation: 
we maintain a dictionary mapping expressions to conditions on them. Given 
an expression for which we wish to derive some conditions, saturation works 
recursively on each subexpression, matching it against the main argument of 
each rule (left side of inequalities, or the last argument of predicates). For each 
match, it looks in the dictionary for existing facts that justifies assumptions of 
the rule. Special reasoning is performed on numerical constants (e.g. £ < cı can 
be used to justify £x < c2 if cı < c2). Comparison between numerical constants 
are currently done with floating-point approximation. 

The approach described here is relatively simple, and it is not difficult to 
ensure termination, as we only get conditions on expressions that already appear. 
However, in practice it can be quite powerful when combined with user-guided 
rewriting, as shown by the example in Sect. 3.2. 


Simplification. Simplification of expressions works in mostly the same way 
as [18], and we restate the main ideas. We normalize with respect to AC-property 
of addition and multiplication, and combine equal terms. When trying to com- 
bine ¢@t° into t?+°, we check using the current context that either t is nonzero 
and a,b are integers, or t is nonnegative. This prevents cancellation of e.g. t/t 
into 1 when t may be zero. 

Moreover, we apply identities in the context that are marked with the simplify 
attribute. These cover evaluation of functions at special values, as well as issues 
like removal of absolute value sign (e.g. |x| = x if x > 0). 


Normalization. There are situations where different forms of an expression are 
desirable for different purposes, e.g. factorized vs. expanded form of a polyno- 
mial, single quotient vs. a sum of quotients, etc. We designed the simplifier to 
not make a choice in such situations. Instead, if the user wishes to convert an 
expression to a different form, she can specify the rewriting explicitly. Iscalc then 
normalizes both old and new expressions and check whether they are equal. Nor- 
malization expands polynomials and combines quotients (e.g. for checking partial 
fraction decomposition), and performs (among others) rewriting of logarithm and 
exponentials. 


Computing Limits. For limit computations, we implement a simplified version of 
the approach by Gruntz [10]. To compute lim,_,.. €, we evaluate recursively the 
limit of each subexpression in e, as well as the asymptotics of approaching that 
limit. Possible asymptotics include powers of polynomials and logarithms, as 
well as exponentials. Finding the limit as x approaches other values is converted 
to computing the limit at infinity. 

As with other algorithms, the aim is not to achieve high level of automation, 
but to perform the simpler limits, leaving more complex cases to human guid- 
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ance (e.g. using L’Hopital’s rule or with rewriting). On the other hand, using 
the complete algorithm of Gruntz, or the algorithm implemented by Eberl in 
Isabelle [8], would certainly increase automation and range of applications. 


Solving Equations. We implement simple equation solving, including isolating 
the expression to be solved, and solving linear equations. This is used when 
performing substitutions and in transforming/applying an existing equality. 


2.4 Rules 


Based upon the collection of algorithms in the previous section, Iscalc implements 
a set of rules for transforming the current expression in a computation. Currently 
37 rules are available. We give some representative examples below. 


Integration Rules. The list of integration rules are mostly inherited from [18]. 
They include Substitution, IntegrationByParts, etc. Integration identities can be 
applied by lookup from the context. There are also rules for more advanced 
techniques such as differentiating under the integral sign (illustrated in Sect. 3.1), 
and exchange of integral and sum (illustrated in Sect. 3.3). 


Rewriting Rules. The most basic rewriting rule is FullSimplify, which applies 
simplification to the current expression. Applyldentity applies an identity from 
the context. This generalizes the use of Fu’s rules for trigonometric identities [9]. 
The rule Equation supports rewriting to another form of an expression with 
equal normal form. Series expansion and evaluation of series are available as two 
different rules (again looking up identities from the context). 


Equality Transformation Rules. These rules transform one equality into another. 
IntegralEquation transforms an equation of the form Deriv(e, x) = g(x) into e = 
Indefinitelntegral(g, x, fuars), where fvars is the list of free variables in Deriv(e, x). 
Another very flexible rule is SolveEquation, which solves for some expression e in 
an equality s = t to give another equality e = e’. Other examples include taking 
limit on both sides, applying a function to both sides, and so on. 


Other Rules. Besides the above three major categories, other rules include the 
L’Hopital’s rule for computing limits, and rules for series manipulations. 


2.5 Proof Methods 


In [18], the only way to perform a computation is starting from a single expres- 
sion, and applying rules to transform that expression. More complex applications 
necessitate more structures in the computation. We describe those supported by 
Iscalc briefly, as they are all familiar from other theorem provers. 


Proof by Computation. To show an equality a = b, perform computation on both 
sides until they become identical. Likewise, for inequalities, perform computation 
on both sides until the inequality can be shown automatically. 
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Proof by Transformation. Starting from a known equality a = b, apply the equal- 
ity transformation rules in Sect. 2.4 to obtain new equalities, until the desired 
one is obtained. 


Case Analysis. To show a goal, divide into cases either by whether some compar- 
ison formula is true, or according to whether some expression is less than, equal 
to, or greater than 0. We shown an example with inequality goals in Sect. 3.2. 


Induction. Some integrals involve an integer parameter n > 0, and may be 
proved by induction on n. We support such inductive reasoning in Iscalc. The 
rule ApplylnductHyp can be used to apply inductive hypothesis at any time in 
the inductive branch of the proof. 


2.6 Top-Level Computation, Automation, and User Interface 


Based on the above rules and proof methods, Iscalc supports performing a variety 
of symbolic computation, including showing inequalities, checking convergence, 
evaluating limits, and performing indefinite and definite integrals. It is also pos- 
sible to build higher-level automation on top of the rules. An implementation of 
Slagle’s method is inherited from [18]. It performs best-first search using algo- 
rithmic and heuristic steps for performing an integral. If the search succeeds, it 
outputs a sequence of rules to apply, which can then be replayed in Iscalc. 

The user interface of Iscalc is mostly inherited from [18]. The primary goal 
is to provide a visual interface that feels similar to that of a computer algebra 
system, and which allows mostly point-and-click based interactions. In particu- 
lar, computation steps are performed by selecting rules to apply from the menu. 
For certain rules, the user may need to select a subexpression of the current 
expression to apply the rule on, and/or choose from suggestions given by the 
computer (e.g. when rewriting using identities). 

Additional features in the current work, such as book and file hierarchy, and 
proof methods, are also supported in the user interface. This includes display 
and navigation of book and file contents. To begin the proof of an equation, the 
user selects from the menu one of the proof methods in Sect. 2.5. The structured 
computation is then displayed in a reader-friendly format. An example showing 
display of file contents and a computation is given in Fig. 2. 


3 Examples 


We applied Iscalc on computations of limits, indefinite integrals, and definite 
integrals from a variety of sources. Three sources are inherited from [18]: an 
exam preparation book (Tongji), online problem lists by D. Kouba [13], and the 
MIT integration Bee [1]. The range of applicability is greater on these problem 
sets. For example, we can now perform all examples in the exponentials and 
trigonometric category from D. Kouba’s problem lists, while the previous work 
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Definition Show 
Ia,b) = [7 =O) Fa, b) = ™E* +.C(b) for a> 0, b> 0 
Goal by rewrite goal (finished) 
#1(a,b) =e 1. Proof: 
Goal d =, 
iat) riga +00) qa l(a; b) 2a 
4,0) =o => I(a,b) = f Z da (integrate both side) 
Goal oe = I(a,b) = $(f 4 da) (full simplify) 
Cla) =-~; = I(a,b) = Flog (la|) + C(b) (apply indefinite integral) 
Goal = I(a,b) = Zt + C(b) (full simplify) 


I(a,b) — a _ nies? 


Fig. 2. Screenshot of the user interface, showing part of the example given in Sect. 3.1. 
The menu groups related rules into categories. The Proof category contains general 
actions such as proof by calculation and induction. The remaining five menu categories 
contain rewriting rules. The left side of the main window shows division of the com- 
putation into several parts, and the right side shows the selected part as a series of 
computation steps. On the bottom (not shown) are space for users to enter additional 
information for a computation step. 


can perform only 7/12 and 22/27 examples respectively, due to limitations of 
SymPy as well as other unsupported features. 

The main additional benchmark comes from the textbook Inside Interesting 
Integrals [17]. 71 integral calculations are performed in Iscalc, covering about 
half the content of the book, including early results about Gamma and zeta 
functions. Many of the remaining examples involve complex numbers and contour 
integration, which are not supported by the current version of the tool. 

Next, we illustrate some special functionality of Iscalc using examples. 
From these examples, we wish to emphasize how different algorithms and rules 
described in Sect. 2.3 and 2.4 interact with each other, enabling a computation 
process that is very close to human writing. 


3.1 Working with Indefinite Integrals and C 
The goal is to evaluate Frullani’s integral (Sect. 3.3 of [17]). 


œ 4a ,—1 eel 
ias f tan™ (ax) - tan™ (bx) dz 
0 


under the condition a > 0,b > 0. The computation starts by computing 
al (a,b) = 5g, which follows by exchanging derivative and integral, then using 
the formula for the definite integral Jo a dx. The key step is integrating both 
sides of #J(a,b) = = using rule IntegralEquation to obtain I(a,b) = f Æ da, 
which evaluates to 


l 
I(a,b) = Z — + C(b) 
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Here it is important to keep track of the dependency of the constant in f z; da 
on the variable b, which is kept in the argument deps of the expression. This 
variable is then shown explicitly as an argument to the Skolem term C when the 
indefinite integral is evaluated. 

Next, substitute b by a in the above equation, and from I(a,a) = 0 obtain 
C(a) = m98, Substituting back in the above equation gives the final answer 


Tloga mlogb 


I(a,b) = 5 5 


The entire computation can be carried out in Iscalc much as described above, 
consisting one definition and four goals, and using 17 rule applications. 


3.2 Wellformedness Checks 


An example from Sect. 2.3 in [17], illustrating partial fraction decomposition, 
involves computing the following integral: 


ey 1 
I(a) = d 
(a) | x4 + 2x? cos(2a) + 1 7 


under the condition cos(a) # 0. One particularly tricky point is that it is not 
obvious why the denominator is always nonzero. This cannot be shown automat- 
ically by Iscalc. However, we can state a separate goal showing this fact by case 
analysis. One of the step during the computation involves an integral with the 
same denominator, but with bounds (—oo, co), so we perform the check without 
any assumption on z7. 

We perform case analysis on whether x is equal to 0. If z = 0 then the goal 
simply reduces to 1 4 0. If « Æ 0, we rewrite the goal as follows (the name of 
the rule applied is shown at right): 


a’ + 2a cos(2a) + 1 


= (x? — 1)? + 2x°(1 + cos(2a)) (Equation) 
= (x? — 1)? + 2x? (1 + (2cos”(a) — 1)) (Applyldentity) 
= 4x? a + (a? — 1)? (FullSimplify) 


Now, from x Æ 0 and cos(a) Æ 0 we get 4x? cos? (a) > 0. Also (x? — 1)? > 0, so 
the whole expression is greater than zero (and hence nonzero). The inequality 
checking algorithm in Sect. 2.3 is able to perform this reasoning automatically, 
hence shoving the expression in the integral is well-defined. Interestingly, the 


answer TT given in the book is not fully correct. It only holds when cos(a) > 


0. If cos(a) < 0 the correct answer is — we can easily check there is a 


G) ( 
mistake since the integrand is always positive). 
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3.3 Convergence Checks 


For the final example, we illustrate integration using series, as well as checking 
convergence. The example comes from Sect. 5 of [17]. The goal is to evaluate 


1 
I log(1 +2) a 
0 


T 


The technique used is to expand the Taylor series for log(1 + x) (using rule 
SeriesExpansionldentity), then exchange integration and summation. During the 
exchange the body of the sum and integral is Da As the body changes sign 
for different values of n, there is potential danger that the sum is not absolutely 
convergent, and the exchange of sum and integral is incorrect even if the final 
answer is finite. To exclude this possibility, Iscalc requires the user to first show 


the convergence of Jooo j 25 dx. This is checked after the computation 


o0 1 mo oo 1 1 n oo 1 
yi, aa aD, S wD 


=0 
which is convergent by the p-series test implemented within Iscalc. This shows 
the exchange of sum and integral is indeed safe. The final result of the integral 
is fy, which can be computed in Iscalc using 10 rule applications (including 3 
for showing convergence), assuming the value of some standard infinite series is 


already known. 


4 Discussion 


While there has been a long line of research on visual user-interfaces for inter- 
active theorem proving, one persistent issue is that they are mostly limited to 
simple examples or narrow application areas. For large scale formalizations, the 
number of actions the user can perform steadily increases, so it becomes more 
and more difficult to organize them in the user interface. Our work can be seen 
as an exploration of how far we can go in the limited, but still wide area of 
symbolic computation. We believe the results are positive. In particular, the 
following design decisions contribute to controlling complexity: 


— Apply rules automatically as much as possible, so they never need to be 
explicitly selected by the user (e.g. normalization and inequality checking). 

— Group related identities into a single rule (e.g. integrals, series expansions, 
etc.). After the user selects one of these rules, performing matching on the 
list of available identities and provide choices to the user. 

— Group related rules into categories. For example, rules for evaluating integrals, 
rules for series manipulation, etc. This results in a two-level menu where the 
user may find appropriate rules more easily. 


The end result is that the user does not need to recall names of any existing 
identity (in fact no names are assigned at all). Instead, all results are either 
applied automatically, or selected after matching from a list of suggested choices. 
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4.1 Related Work 


There is a large body of work combining theorem proving and symbolic com- 
putation, and in user interface design for theorem provers. Some earlier works 
include Harrison and Théry’s “skeptic’s” approach to invoking computer algebra 
systems from a theorem prover [12], and Bauer et al’s Analytica [5], which imple- 
ments automatic theorem proving for elementary analysis within Mathematica. 
We leave a detailed review to [18,19]. More recently, Lewis and Wu [14] imple- 
mented a bi-directional interface between Lean [16] and Mathematica. Donato et 
al. designed an interface for constructing proofs using drag-and-drop actions [7]. 

There are also many implementations of proof procedures related to computer 
algebra. For example, the tool MetiTarski for proving inequalities by Akbarpour 
and Paulson [2], and the heuristic-based prover Polya by Avigad et al [4]. For 
computation of limits, Eberl implemented verified computation of asymptotics 
with generated proofs in Isabelle [8]. We do not claim our procedures to be more 
effective than the ones listed above, but focus on their combination with user 
guidance to allow performing more complex symbolic computations. 


5 Conclusion 


In this paper, we introduced Iscalc for performing symbolic computation inter- 
actively, as a significant extension to the system described in [18]. This results 
in a more extensible tool with greater range of applicability, in particular able 
to check difficult computations from the textbook [17], and find some mistakes 
in the process. 

In future work, we wish to extend the functionality of Iscalc to handle complex 
numbers, multiple integrals, and vector calculus. One particularly interesting 
question is how to support evaluation of contour integrals (the formalization of 
which have been done in Isabelle by Li and Paulson [15]). On the applications 
side, we intend to explore verification of control systems [3]. 

Finally, more work would be required to extend the proof reconstruction 
in [18] to the larger set of functionality available, as well as linking with library 
of theorems in analysis. The custom language of expressions defined here is 
independent of particular choice of logical foundation, hence proof reconstruction 
should be possible in any interactive theorem prover. 
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